Sit quietly for a moment and pay attention to the different sounds around you. You might hear appliances beeping, cars honking, a dog barking, someone sneezing. These are all noises Cochlear.ai, a Seoul-based sound recognition startup, is training its SaaS platform to identify. The company's goal is to develop software that can identify almost any kind of sound and be used in a wide range of smart hardware, including phones, speakers and cars, co-founder and chief executive Yoonchang Han told TechCrunch.
Cochlear.ai announced it has raised $2 million in Series A funding, led by Smilegate Investment, with participation from Shinhan Capital and NAU IB Capital. This brings its total funding so far to $2.7 million, including a seed round from Kakao Ventures, the investment arm of the South Korean internet giant. Cochlear.ai will use its Series A on hiring over the next 18 months and to increase the data set of sounds used to train its deep learning algorithms.
The company was founded in 2017 by a team of six music and audio research scientists, including Han, who completed his PhD in music information retrieval at Seoul National University. While working on his doctorate, Han found "that everyone was really focusing on speech recognition systems. There are so many companies for that, but analyzing other kinds of sounds are technically quite different from speech recognition."
Speech recognition technology usually recognizes one or two voices at a time, and assumes that people are engaging in a conversation, instead of talking over one another. It also uses linguistic knowledge in post-processing to increase accuracy. But with music or environmental noises, different types of sounds usually overlap.
"We have to take care about all different frequency ranges, and there are not only voices, but really thousands of sounds out there," Han said. "So we think this will be the next generation of sound recognition, and that was the motivation for our startup."
Cochlear.ai’s SaaS, called Cochl.Sense, is available as a cloud API and edge SDK, and can currently detect about 40 different sounds, which are grouped into three categories: emergency detection (including glass breaking, screaming and sirens), human interaction (which includes using finger snaps, claps or whistles to interact with hardware) and human status (to identify sounds like coughing, sneezing or snoring for use cases like patient monitoring or automatic audio captioning).
Han said the company also plans to add new functionality to Cochl.Sense for use in homes (including smart speakers), vehicles and music analysis. Cochl.Sense's flexibility means it can potentially fit many use cases, including turning a smart speaker into a "control tower" for home appliances by detecting the noises they make, or helping hearing impaired people by sending alerts about noises, like car horns, to wearable devices including smart watches.
The sound recognition landscape
Han notes that over the past three years or so, there has been a shift from focusing on speech recognition technology to other sounds as well.
For example, more major tech companies, like Amazon, Google and Apple, are adding context-aware sound recognition to their products. For example, both Amazon Alexa Guard and Nest Secure detect the sound of glass breaking, while iOS 14's sound recognition enabled it to add new accessibility features.
Han said the launches by major tech companies is a boon for Cochlear.ai, because it means that the market for sound recognition technology is growing. The startup plans to work with many different industries, but is currently focused on smart consumer devices and automotive because that is where the most interest for its software is coming from. For example, Cochlear.ai is currently working on a project with Daimler AG to include its sound recognition in cars (for example, alerts if a child is locked inside), in addition to collaborations with major electronic, telecommunications and consumer good companies.
Software that can identify sounds like gunshots, glass breaking and other noises for emergency detection has been around for decades, but conventional technology often resulted in false alarms or required the use of specific microphones and other hardware, Han said.
Other companies dedicated to improving sound recognition technology include Cambridge, England's Audio Analytic, which focuses on context-based sound intelligence, and Netherlands-based Sound Intelligence, which develops software for emergency alert and healthcare systems.
Cochlear.ai plans to differentiate by building software that can be used with a wide array of microphones, including in low-end smartphones or USB microphones, without needing to be fine-tuned, instead relying on deep learning to refine its algorithms and reduce false positives.
During the early stages of building a data set for a specific sound, Cochlear.ai's team records many audio samples by themselves, using older smartphone models and USB microphones, to ensure that their software will work even without high-quality microphones.
Other samples are gathered from online sources. Once the sound’s initial learning model reaches a certain level of accuracy, it is then able to search online by itself for more of the same kind of audio clips, exponentially increasing the speed of data training. Cochlear.ai's Series A will enable it to build data sets of audio samples more quickly, allowing it to add more sounds to its software.
"All of our co-founders are researchers in this field, so signal processing and machine learning techniques -- we are trying many different algorithms, because every sound has different characteristics," said Han. "We have to try many different things to make one single model that can identify all different sounds."
Edit: This story has been updated with the correct spelling of Audio Analytic.