Resources

Luganda Text sources

AI4Good Luganda datasets from AI Lab at Makerere University BBSTV Twitter page Scrap tweets Radio Simba Twitter Page Scrap tweets GalaxyFM Website Preparing LJ Speech Datasets

TTS Engines

Ossian-Merlin
Coqui Engine Newsletter
Mimic
Mozilla TTS
colab notebook for Voice Conversion with VITS:
Y combinator discussion on TTS \
https://github-wiki-see.page/m/coqui-ai/TTS/wiki/Implementing-a-New-Model-in-%F0%9F%90%B8TTS
https://pythonrepo.com/repo/mozilla-TTS-python-natural-language-processing Best resource for starter

Blog post on how to train a Tacotron model faster for convergence

Colab notebook to train using Mozilla Tacotron 2 with DDC

Page with various coqui TTS colab notebooks

https://github.com/tugstugi/pytorch-dc-tts

Tensor Board Upload

https://tensorboard.dev/

MOS - Mean Opinion Score

Calculating MOS

Audio Sources

Edoboozi Lya Katikkiro ku BBS TV youtube

Recording tools

Mimic Recording Studio (It can show one the recording pace and also automatically treams beginning and ending silence)
Audacity

Helper script to convert Mimic recording studio data to LJSpeech format

https://github.com/thorstenMueller/deep-learning-german-tts/blob/master/helperScripts/MRS2LJSpeech.py

Luganda segment list, consonants on phoible

-Vowels and Consonants from Irene's research

Inventory

Language

Segments

Vowels

Consonants

Tones

luganda (AA 770)

Ganda

Why - i think (open) voice is important

Voice based human <-> machine interactions will come more
Users expect decent quality on artificial voices (level set by Amazon/Google)
Tech companies do not share voice models with community
Level of trust and privacy is a concern on cloud based voices
“Talking devices” might not be connected to the internet(for technical or security reasons)
Companies might have money to buy “voices” but most open source communities won’t have this option
For that i decided to contribute my voicewithout any licence restrictions (CC0)
Wikitounges
Interesting translator
Even google cloud TTS has no Luganda localization for this
Handling Demographic Biases in corpus
Artie Bias Corpus: an audio corpus + code for detecting demographic bias

CV tool to split TEST,DEV, TRAIN sets

CorporaCreator

This ensures no single speaker is in more than one dataset of either dev, test or train sets even when they speak different sentences

Research into existing similar Solutions

Mary TTS
CMU Flite
topic modelling with LDA for TTS and STT
Create your own TTS with Coqui or mimic

Why use LDA as opposed to neural networks 1. Its hard to understand why the neural network labelled an item wrong. 2. LDA can use as little datasets to get good results as opposed to neural networks that need large datasets. LDA models - Gensim

Relevant Papers

Papers that used ChitChat Software an inhouse data collection tool

https://www.aclweb.org/anthology/L16-1317.pdf

https://storage.googleapis.com/pub-tools-public-publication-data/pdf/c5cf9f28ba2a9c0e50bc885bfad3bfbff3b4afbd.pdf

DDC - Double Decoder Consistency

TTS (Model Architectures)[https://erogol.com/text-speech-deep-learning-architectures/]

Open-Source Consumer-Grade Indic Text To Speech
Googles Tacotron paper
Text to speech synthesis for ethiopian semitic languages: Issues and the way forward
Luganda Text-to-Speech Machine uses 511 sentences from the bible to train with the MARYTTS by Irene Nandutu.
German Dataset
HIU Audio German

Flores Facebooks Low resource languages translator

Github page Google AI blog

Communities

Mycroft AI Chat for Mycroft AI Coqui TTS engine Mozilla Discourse Mozilla's common voice deepspeech platform on element

Research done into Speech synthesis

https://youtu.be/qbPl0xoDnzQ https://www.speech.kth.se/tts-demos/ speech that was not recorded with the purpose of being used in research (e.g. radio segments, public speeches, podcasts, archival recordings). Such data hold great worth in several regards but is typically challenging to work with. ~ Per Fallgren Edyson tool Helpful project PI for swedish thing Zofia Malisz and Per Fallgren Making Machines speak

Develop a vocal ID personna We wouldn’t dream of fitting a little girl with the prosthetic limb of a grown man, why then was it okay to give her the same prosthetic voice as a grown man? - How to share your voice with Local ID

Useful commands

Working with tar.gz files navigate to directory of the file and type in the commandline tar -xvf file_name.tar
Other TTS or STT Platforms Ajala Studios Appen Open Speech and Language Resources Lwazi Speech Corpus
TTS Engines Great resource summarising all TTS

Mimic by microsoft The mimic II platform The Mimic Recording Studio simplifies the collection of training data from individuals, each of which can be used to produce a distinct voice for Mimic. Corqui

Baseline to compare already an existing Luganda STT for corqui

Implement a Luganda TTS model on corqui example-synthesizing-speech-on-terminal-using-the-released-models Implementing-a-New-Model Use the one stop common utils shop to pre process common voice data

!CUDA_VISIBLE_DEVICES=0 python TTS/bin/train_tts.py --config_path config.json --restore_path pretrained.pth.tar --logdir "{toutdir}"

Running ngrok on colab to show tensorboards as you train

-https://gist.github.com/mrm8488/0a252982460134249a93cc92ef01deca

To prevent idle timeout in colab.

Click inspect page or Press F12 OR CTRL + SHIFT + I Then paste what is below in your web console tab

function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)

PreviousAccess resources wiki page here NextTacotron-2

Last updated 3 years ago

Was this helpful?