(Update: Dec 2019) We did a comparison of the major speech-to-text engines ourselves and the details are available in this post. Since we conducted that study, we believe it’s more reliable and less unbiased than numbers reported by the speech-to-text companies. The older, unedited post is still below.
On the way to work the other morning, I decided to be a responsible boy and use Apple’s Carplay to send a quick message. Siri asked for a contact to send a message to. I have my friend’s name; the phone understood that pretty well. Then I dictated the message: “Afternoon meeting canceled, want to grab a coffee?” That seems easy enough, right?
Well, that’s not what happened. Instead, the phone sent “Afternoon me being canceled, want to grab a coffin.”
“Afternoon me being canceled, want to grab a coffin.”
Now, I think we can all agree that this speech-to-text recognition is not ideal. Most likely, you have had a similar situation with your phone/speaker assistant. But what’s the deal? It’s 2019, after all.
The thing is, automated transcription services will always have a degree of imperfection. Certain factors, like ambient noise, affect a computer’s hearing just as much as humans. Also, improving a speech recognition program is entirely dependent on what is being uploaded into it. For “machine learning” to exist, you have to have a good textbook.
Which is how we got a recent study done by our friends at Rev.com. They created a pool of audio samples and ran them through 4 automatic speech recognition services: their own, Google, Amazon, and Microsoft. Each transcript was then compared to the human-annotated ground truth. From that, they calculated each service’s word error rate (percentage of incorrect words).
A couple of important things to note here. First, the graph represents the percentage of incorrect results, meaning Rev’s AER program returned about 86 percent accurate transcripts. Second, all of these services have errors. Which, as I have mentioned, is to be expected.
The only way to guarantee 100 percent accuracy with any automatic transcription service, in their current state, is to have a human check it. Parmonic’s premium plan includes a human review: every AI-generated moment and transcript is reviewed by one of our video experts. That way, you know you are getting the best results!
Give it a look here.