Smart AI-based systems can now hear gunshots, shout for help

new Delhi Gurugram-based Stack Technologies co-founder and CEO Atul Rai is eyeing a tender for the Lucknow Smart City project for audio and video surveillance to improve security.

Rai already has a product called Jarvis which is used by the Uttar Pradesh Police and other state police forces, with closed circuit cameras (CCTVs) and artificial intelligence (AI) based facial recognition.

In its new version, Jarvis uses cameras not only to spot crimes, but also microphones to hear what’s happening in the city. “We have used audio analytics to trace incidents like the jail battle in Uttar Pradesh. We aim to implement this in smart cities.”

Stack is one of the few companies in India that offers AI-based audio analytics tools. These systems can identify gunshots, a person’s scream or specific words that indicate distress. They use ‘convolutional neural networks’ (CNN) to identify sound types. CNNs are commonly used for image and video recognition, but here, they are being used to understand patterns in sounds. Potentially, an audio surveillance system should be able to alert the nearest hospital if an accident occurs, or contact the police if a group of people is planning a crime. “Every camera is capable of sending audio data using a mic. If a crime is being committed at the sight of this camera, the audio can help identify if someone is in distress and needs help,” Rai explained.

According to Rai, there are many ways to use audio analysis for security. One is using audio to identify a scene, such as a fight, violence or screaming. Another is to identify a person by their voice if they are not facing the camera. This can help identify people with prior criminal records through their voices even when they are out of jail.

Rai said that the Lucknow Smart City project has expressed interest in an audio and video solution and a demo will be held soon. Jarvis is “language-independent” and looks for specific sound symbols that could indicate distress or accident, Rae said.

According to Rai, Jarvis’ accuracy has been tested against Voxceleb – one of the largest audio visual datasets for human speech. He claimed that the system is 98.7% accurate. The company is also working on a new natural language processing (NLP)-based feature that will allow users to ask for information from Jarvis, prompting Jarvis to scan data across all cameras.

The use of audio symbols or voices for law enforcement is gaining traction globally. In Europe, Interpol built a speaker identification solution in 2018 to identify criminals from voice samples, while police forces in the US are reportedly building a database of criminals’ voice samples.

That said, such solutions come with significant privacy concerns. Pam Dixon, founder and executive director of the World Privacy Forum, a public interest research group, warned that “much will depend on how the system is installed, implemented and used.” Dixon points out that even assuming that these systems are without technical bias and are accurate, there will be questions about where the recordings are stored and for how long. “Such monitoring systems need to be transparent and clearly state what words and sounds are being heard. Policies for these systems need to be in place prior to their creation and use,” he emphasizes. She says by giving

Supreme Court lawyer NS Nappinai agrees, “India does not have a regulatory framework for CCTV cameras which already exist in many countries. The same rule applies for audio, so stakeholders are aware of what is permissible and what is not.”

subscribe to mint newspaper

, Enter a valid email

, Thank you for subscribing to our newsletter!

Never miss a story! Stay connected and informed with Mint.
download
Our App Now!!

,