The Sociolinguistics Lab within Michigan State University’s Department of Linguistics, Language, and Cultures hosts Michigan Diaries, which chronicles the changes in the lives and languages of people throughout and after the Covid-19 pandemic, as well as provides insight into language change over time.
Michigan Diaries collects long form audio recordings from volunteers and then uses Google’s Automatic Speech Recognition (ASR) commercial software to create time-aligned transcripts needed for researchers to conduct linguistic analysis. Currently, there is no robust open-source ASR software package for researchers to circumvent Google’s platform. This leads to unnecessary costs and privacy concerns as private data is getting sent to a third party.
Our On-Premises Automatic Speech Recognition Pipeline is an open-source all-in-one speech-to-text software package that creates time-aligned transcripts from audio files.
After users upload their audio files, the system automatically constructs a time-aligned transcript. This transcript is separated into sentences, along with metadata including timestamps of when the sentence begins and ends, speaker identification, and confidence values for the predicted text.
The training feature is used to improve performance. Users upload their own datasets to fine-tune various models used for inference including, speaker diarization, ASR, punctuation restoration, language modeling and entity extraction.
Our system avoids the need to use Google’s speech recognition technologies, reducing operational costs and protecting user privacy.
Our ASR pipeline is containerized through Docker and implemented using state-of-the-art machine learning libraries Hugging Face, NVIDIA NeMo and Pyannote.