Next-Generation Sensor Fusion for Next-Generation Sensors and Driving Functions

Eric Richter
6 min readNov 24, 2021

--

Current sensor fusion approaches have inherent properties that limit their applicability for next-generation driving functions and sensors. Integrated sensor fusion approaches like the Dynamic Grid resolve these limitations and thus, enable next-generation driving functions. In this article, I will show why.

Typical driver assistance systems or automated driving functions consist of several components: One or multiple sensors, the sensor fusion, the driving function, and the actual vehicle control.

Block diagram of typical ADAS and automated driving functions

Current generation ADAS like AEB, ACC, and lane-keeping operate in well-structured environments and need to be aware of similar object types in a limited number of scenarios. For this, low-resolution sensors are used:

  • Low-resolution radar: These radar sensors provide detections where each detection consists of a 2d position and the Doppler velocity. Per detected object, a maximum of 2 to 3 detections are provided.
  • Low-resolution camera: Independent from the pixel resolution of the internal camera sensor, these sensors provide bounding boxes of detected objects, either in the world or the image frame. Often, a single detection per object is provided.

To use the data from multiple sensors and extract dynamic objects, current ADAS apply so-called Dynamic Object Fusion approaches with the following general properties:

Dynamic Object Fusion: Small amounts of sensor detections (blue/red) are processed using Kalman filtering (grey ellipse and arrow).
  • They are based on Kalman filtering, which runs on comparably small CPUs.
  • They require low-resolution sensor inputs and thus are a natural fit for the sensors mentioned above.
  • They require to know all object types during design time. There is no support for unknown object types.
  • They are well suited for large distance tracking of dynamic objects. However, they have only limited performance for near-distance and extended objects.
  • They neither support static environment nor free-space.

Next-Generation Sensors and Driving Functions

Let’s have a look at next-generation radar sensors. Just like the current ones, they provide detections consisting of a 2d position plus the Doppler velocity.

Detections from an HD-radar coming from the contour and from the inside of the objects

However, the big difference is the number of detections, which is many times higher than with current sensors. As a result, radar sensors are now also able to image objects in greater detail.

Output from a camera with semantic segmentation

While current-generation cameras provide detections of objects, next-generation cameras often perform so-called semantic annotation.

For each pixel of the image, the class of the underlying object is determined and provided. Again, much more details of the scene and the containing objects become available to the sensor fusion.

Next-generation driving functions come with new requirements and challenges. While current driving functions often operate on highway conditions, more and more driving functions are developed for urban scenarios.

Environmental model output for next generation driving functions: Dynamic objects (grey), static objects (red), free space (green)

These urban scenarios include more traffic participant types like pedestrians, bicycles, and wheelchairs. Additionally, dynamic objects that are not known at design time are more likely in these scenarios and must be detected. From these driving function requirements, the following needs for the environmental model can be derived:

  • Object extensions: The environment model shall provide the spatial extension of dynamic objects, e.g., for proper lane association.
  • Object motion prediction: For each dynamic object, its kinematics should be determined so that its motion can be predicted up to a certain extend.
  • Static objects: In addition to dynamic objects, static objects should be provided.
  • Free space: As soon as the driving function includes automated steering and path planning, explicit free space information is required.
  • Unknown object types: Modeling all potentially moving objects types upfront is an enormous if not unsolvable challenge. Thus, unknown dynamic objects need to be detected and predicted as well.

Limitations of Current Sensor Fusion Approaches

Current-generation sensor fusion approaches often include a combination of dynamic object fusion and a static occupancy grid. Dynamic object fusion and its Kalman filter approaches have limited performance for small distances and extended objects, mainly due to the often heavily violated point-source assumption that underlies these algorithms. They require to know all object types in advance and thus, have an inherent risk of missing such objects.

Typical clustering error in classical sensor fusion approaches

Furthermore, if used with high-resolution sensors, clustering of the data is required so it can be used with Kalman filters. As such clustering is done at the detection level, clustering errors occur often and can hardly be corrected by later processing steps. These clustering errors then result in poor object fusion performance.

Static occupancy grids contain outdated free space regions (red “tails”) if dynamic objects are within the scene.

Static occupancy grid methods provide static objects as well as free space. While this is sufficient from an interface perspective, static grids also contain dynamic objects that generate the typical incorrect tails, resulting in false negatives for free space.

To circumvent this effect, people often try to exclude measurements from dynamic objects before putting the measurements into the grid. For determining which measurements stem from dynamic objects, the association from the dynamic object fusion is used.

Error propagation in classical sensor fusion approaches: If errors occur in the dynamic object fusion, they propagate to the static grid.

As the quality of the association heavily depends on the clustering mentioned above, errors are propagated from the clustering through the object fusion to the static occupancy grid.

Built-In Consistency: Objects and Free Space Based on a Single Source

Integrated sensor fusion approaches solve this chicken-and-egg problem by estimating both static and dynamic objects as well as free space.

Dynamic Grid (upper left) using HD-radar sensors (upper right) only. Camera images used for visualization only.

The Dynamic Occupancy Grid or short Dynamic Grid is an integrated low-level sensor fusion approach that does not rely on a clustering step as mentioned above. Thus, it does not suffer from error propagation due to early decisions.

The dynamic grid adds estimates of the object dynamics in the grid cells by using different hypotheses for the object velocity and direction („particles“) and thus enables better prediction.

Instead, the classification between static and dynamic objects is based on more information, resulting in better performance. Per design, there are no conflicts between dynamic traffic participants and static environment as they are derived from the same model at low latency.

Up-to-date free space estimation of the Dynamic Grid (left), outdated free space estimation of the Static Occupancy Grid (right)

The grid-based approach also supports unknown dynamic object types, which further improves the applicability in urban environments. If a sensor provides object classification such as semantic segmentation information from a camera, this information can be added to the grid.

Enhancing the dynamic grid with additional semantics

This improves the robustness of object extraction, provides detailed class information for traffic participants and can be used to classify free space, e.g. for automated parking applications.

To summarize, classic sensor fusion approaches like dynamic object fusion and static occupancy grids have limited performance and applicability for high-resolution sensors and next-generation driving functions.

Low-level sensor fusion approaches like the dynamic grid resolve these limitations by directly processing the HD sensor data in an integrated fashion, providing a consistent environmental model consisting of dynamic and static objects as well as free space plus class information.

--

--