Resources
Luganda Text sources
AI4Good Luganda datasets from AI Lab at Makerere University BBSTV Twitter page Scrap tweets Radio Simba Twitter Page Scrap tweets GalaxyFM Website Preparing LJ Speech Datasets
TTS Engines
Coqui Engine Newsletter
Mimic
colab notebook for Voice Conversion with VITS:
Y combinator discussion on TTS \
https://pythonrepo.com/repo/mozilla-TTS-python-natural-language-processing Best resource for starter
Blog post on how to train a Tacotron model faster for convergence
Colab notebook to train using Mozilla Tacotron 2 with DDC
Page with various coqui TTS colab notebooks
https://github.com/tugstugi/pytorch-dc-tts
Tensor Board Upload
MOS - Mean Opinion Score
Calculating MOS
Audio Sources
Edoboozi Lya Katikkiro ku BBS TV youtube
Recording tools
Mimic Recording Studio (It can show one the recording pace and also automatically treams beginning and ending silence)
Audacity
Helper script to convert Mimic recording studio data to LJSpeech format
Luganda segment list, consonants on phoible
-Vowels and Consonants from Irene's research
Inventory
Language
Segments
Vowels
Consonants
Tones
luganda (AA 770)
Ganda
33
10
19
4
Why - i think (open) voice is important
Voice based human <-> machine interactions will come more
Users expect decent quality on artificial voices (level set by Amazon/Google)
Tech companies do not share voice models with community
Level of trust and privacy is a concern on cloud based voices
“Talking devices” might not be connected to the internet(for technical or security reasons)
Companies might have money to buy “voices” but most open source communities won’t have this option
For that i decided to contribute my voicewithout any licence restrictions (CC0)
Even google cloud TTS has no Luganda localization for this
Handling Demographic Biases in corpus
Artie Bias Corpus: an audio corpus + code for detecting demographic bias
CV tool to split TEST,DEV, TRAIN sets
This ensures no single speaker is in more than one dataset of either dev, test or train sets even when they speak different sentences
Research into existing similar Solutions
topic modelling with LDA for TTS and STT
Why use LDA as opposed to neural networks 1. Its hard to understand why the neural network labelled an item wrong. 2. LDA can use as little datasets to get good results as opposed to neural networks that need large datasets. LDA models - Gensim
Relevant Papers
Papers that used ChitChat Software an inhouse data collection tool
https://www.aclweb.org/anthology/L16-1317.pdf
DDC - Double Decoder Consistency
TTS (Model Architectures)[https://erogol.com/text-speech-deep-learning-architectures/]
Text to speech synthesis for ethiopian semitic languages: Issues and the way forward
Luganda Text-to-Speech Machine uses 511 sentences from the bible to train with the MARYTTS by Irene Nandutu.
Flores Facebooks Low resource languages translator
Communities
Mycroft AI Chat for Mycroft AI Coqui TTS engine Mozilla Discourse Mozilla's common voice deepspeech platform on element
Research done into Speech synthesis
https://youtu.be/qbPl0xoDnzQ https://www.speech.kth.se/tts-demos/ speech that was not recorded with the purpose of being used in research (e.g. radio segments, public speeches, podcasts, archival recordings). Such data hold great worth in several regards but is typically challenging to work with. ~ Per Fallgren Edyson tool Helpful project PI for swedish thing Zofia Malisz and Per Fallgren Making Machines speak
Develop a vocal ID personna We wouldn’t dream of fitting a little girl with the prosthetic limb of a grown man, why then was it okay to give her the same prosthetic voice as a grown man? - How to share your voice with Local ID
Useful commands
Working with
tar.gz files
navigate to directory of the file and type in the commandlinetar -xvf file_name.tar
Other TTS or STT Platforms Ajala Studios Appen Open Speech and Language Resources Lwazi Speech Corpus
TTS Engines Great resource summarising all TTS
Mimic by microsoft The mimic II platform The Mimic Recording Studio simplifies the collection of training data from individuals, each of which can be used to produce a distinct voice for Mimic. Corqui
Baseline to compare already an existing Luganda STT for corqui
Implement a Luganda TTS model on corqui example-synthesizing-speech-on-terminal-using-the-released-models Implementing-a-New-Model Use the one stop common utils shop to pre process common voice data
!CUDA_VISIBLE_DEVICES=0 python TTS/bin/train_tts.py --config_path config.json --restore_path pretrained.pth.tar --logdir "{toutdir}"
Running ngrok on colab to show tensorboards as you train
-https://gist.github.com/mrm8488/0a252982460134249a93cc92ef01deca
To prevent idle timeout in colab.
Click inspect page or Press F12 OR CTRL + SHIFT + I Then paste what is below in your web console tab
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
Last updated
Was this helpful?