The phoneme thing about speech recognition
I have many google homes in my flat. I use them to turn on the lights, set timers, do unit conversions and play ocean sounds. I wouldn’t consider their speech recognition to be good. My girlfriend with her feminine voice has an even harder time with them, often resorting to putting on a comedic deeper voice to get google to recognise her commands. I’ve looked into how speech recognition is done and when building my computer back in January one of the goals I had for it was to train my own language model. I’ve written about ideas I have for getting the computer to recognise language before. My understanding of the most up-to-date techniques for language recognition is to use a CTC network to train between the audio and words. The training data that is used is labelled speech. That would consist of a mp3 file of someone saying a sentence and then a text file or record of that sentence. The audio file will then be decomposed into a frequency analysis creating an image like the following w...