Hello everyone, let me start by asking you a question. Did you ever stay awake thinking of a problem and ways of solving it? Oftentimes, looking at the problem from a different point of view helps.

This blog discusses our experience here at Spritle about one such instance and how we addressed the complex problem.

## The Problem

While tracking an object we rely on the current position and the previous predicted points for accurate path prediction

The problem is, no matter how well an object detection model is tuned and how well it is optimized, the accuracy of machine learning models is almost never 100%, there will always be edge cases or anomalies which will be predicted wrong.

This is a huge problem, especially in cases where wrong calculations are just not an option since they can lead to errors in the calculation of important metrics like OEE.

For example, consider the path of the bicycle in the video below

Please note that,** ****1. Only half the frame of the video is processed2. No 3D transformation has been applied to the coordinates and the tracking is done on 2D pixels.3. The video is slowed down for the demo.**

Here, you can clearly see that the path followed by bicycles 1 and 2 is not accurately represented.

While there are solutions like using ensemble methods (where multiple machine learning models are used in combination instead of relying on a single model to get better accuracy), which are brilliant and work well in most cases, we should consider the problems in the practical implementation of these methods to solve problems involving compute-intensive Object Detection Algorithms.

**Speed of detection & limited computation power**

Object Detection Models are compute-intensive. The problem becomes even more evident when working with edge devices like Jetson Nano.

TensorRT and other optimization methods can come in handy but we should first explore if there is an elegant and less computation-intensive method for solving this problem.

## The Solution

When stuck with a problem, observing it from a change of perspective always helps.

Let us think about the problem, not in terms of machine vision and machine learning models but as a general problem.

Robotics is an area that uses cameras extensively for path planning. In robotics, a camera is just another sensor. And what do we use for the correction of errors in sensors you ask? That’s right, **Kalman Filters**. Let’s discuss a little bit about Kalman Filters

##### Kalman Filters

In simple words, Kalman Filters are used to predict the state of a system using output from a sensor with Gaussian noise (another name for error in sensor output).

## Object Tracking with Kalman Filters

Let’s look at the previous example but this time with the application of Kalman Filters.

*Note that, 1. Only half the frame of the video is processed.2. No 3D transformation has been applied to the coordinates and the tracking is done on 2D pixels.3. The video is slowed down for the demo.4. An offset is applied to the path estimated by the KF in green color to clearly show the difference in the demo*.

The path traced by using the Object Detection model is now much more accurate.

## Conclusion

Using a technique that is generally used for the correction of noise from sensors in robotics to correct errors in machine learning models is an elegant way of solving a really complex problem that would take huge amounts of effort.

Changing the perspective and looking at the problem from a different angle can often give a simpler and probably better solution.