blog image

The Big Question ?

Have you ever wondered that the CCTV cameras we use in our workplaces, retail stores, jewelry shops, etc. are being underutilized compared to what they are actually capable of? There are many use cases that can be solved by using your CCTV cameras be it any anomalous event happening around. Here in this blog, we will talk about one of the major concerns of the retail industry i.e. Shoplifting. Along with this, we’ll also talk about how we at DataToBiz approached the solution to the problem.

These days Computer Vision and Deep Learning are becoming prime choices for automation of daily work at many places. The reason behind their success is that they have an edge over providing security to businesses. But, till now only big enterprises have unleashed the potential of automated systems. This time, we at DataToBiz have come up with a solution that any business, be it small or big can use to prevent their daily business loss. As we all know, most of the shop owners nowadays prefer to install CCTV cameras in their shops. But, on a broader view, they limit their motives to only 2 purposes. First, to keep recordings of previous ‘n’ days. Second, to monitor the CCTV live stream for any anomalies.

How can you save your business from daily loss ?

Datatobiz has taken a step forward to better utilize your existing CCTV real-time feed and save manpower for your business. We all know that any crime generates significant losses, either human or economics, or both together. One of the major forms of crime in retail shops is Shoplifting – “the action of stealing goods from a shop while pretending to be a customer”. The second motive of every retail shop owner is to monitor such kind of activity. But the way they follow demands extra manpower that ultimately leads to recurring expenditure. Even following this traditional practice doesn’t prove to be an efficient solution. So, this approach needs to be solved in a completely automated way.

Motivation behind the Shoplifting Solution

According to the 2018 National Retail Security Survey (NRSS) inventory shrink, a loss of inventory related to theft, shoplifting, error, or fraud, had an impact of $46.8 billion in 2017 on the U.S. retail economy. 

According to a survey released by the shoplifting prevention association, Metropolitan Police Department of Japan, the loss is estimated to be 4615 billion yen per year, which is equal to 12.6 billion yen per day. The stunning figure of 12.6 billion daily loss is equal to buying 126 Tesla model S (Big enough! Right?).

And, if we look wisely there is no such manpower that can watch continuously to all such cases daily, and also will not be feasible for any business.

Technical approach to the solution

Fig.1 - Workflow of the Solution

What's new in our proposed solution ?

We have implemented a 3DCNN (3-Dimensional Convolutional Neural Network) to process the CCTV video stream and extract the Spatiotemporal Features out of the frames. Spatiotemporal features are different from traditional 2DCNN models in a way such that it extracts features for an extra segment i.e. Temporal Segment. 3DCNN feature extractor takes a batch of frames as input and out of those frames, it selects some of the frames only for capturing ‘appearance’ features and some of the frames for capturing ‘motion’ related features. Let’s take examples of two different 3DCNN models proposed by Facebook and look at the way these models select frames for feature extraction.

C3D vs Slowfast - By Facebook

Example 1 – If we look into the working of C3D feature extractor model, it selects the first ‘x’ frames out of total ‘y’ frames of a batch to extract appearance-related features and use remaining all frames for extracting motion related features.

Example 2 – But, if we look Slowfast (4×16) model, it takes a total of 64 frames as an input. Then, it selects a total of 4 frames each with an interval of 16 frames for extracting spatial features. Parallelly, it selects a total of 32 frames each with an interval of 2 frames for extracting temporal features.

Note:- Complete explanation of 3DCNN models are beyond the scope of this blog.

The Final Loop - Getting Results

After extracting features, a model is built to perform certain pre-processing to bring down all the features into a fixed shape and then perform regression or classification on the extracted features. Here, whether to do classification or regression depends on your selected feature extraction model and the target use-case. Setting a threshold above which your model will treat the event as anomalous will be different from use-case to use-case because some human activities are comparatively easy to detect (e.g. Running, Eating, etc.) and some are hard (Shoplifting, Shooting, etc.). Once the Shoplifting event is confirmed by the model, a dedicated pipeline has been set up that sends notifications (messages, sound, etc.) to the staff members present there along with that particular event’s screenshot.

Conclusion

The proposed solution is a fully automated way to solve one of the biggest concerns of Retail Shop Owners, Jewelers, Museums, etc. This solution is capable of saving their manpower along with the loss that they had to bear till date. 

DataToBiz has its expertise in developing state of the art Computer Vision algorithms and inferencing them on edge devices in real-time. For more details contact us

Leave a Reply

Your email address will not be published. Required fields are marked *