Content Guidelines
For a high quality voice clone, compile 30 to 45 total minutes of vocals recorded in a single session.
What to include in your dataset
Collect about 20 minutes of confident singing examples in the style you want to clone. Then include an additional 10 minutes including examples of low notes, high notes, isolated phonemes, and sibilant sounds to make sure you’ve covered every possible sound. You should have 30+ minutes of content.
Your AI voice can only accurately reproduce what it hears in your dataset, so include as many words and pitches as possible. And whether it’s audio glitches or singing mistakes, don’t include anything you wouldn’t want your AI voice to learn.
-
Voice cloning requires monophonic input, so do not include any vocal stacks or harmonies.
-
Record your dataset in a single session so your AI voice can capture the exact frequency response of the target voice. Combining different recordings can reduce the accuracy of the cloning process.
-
Decide what vocal quality you would like to clone. For example, if you want your AI voice to have a gritty character when singing high notes, don’t include falsetto vocals in the same pitch range as gritty vocals.
Optional lyrics to make sure your dataset is complete
The following lyrics contain every phoneme in American english. You can use these lyrics with a variety of articulations and melodies for a quality dataset.
In the land where dreams gleam the serene breeze weaves melodies that seize hearts and minds.
The bright light ignites the night, as whispers rise and dive, alive with a tapestry of timeless rhymes.
Through the wilderness, I navigate, zealous through determination my voice resonates and wraps with grace and might.
A symphony of vowels and consonants unite, captivating as the journey unfolds, granting us insight.
From the valleys to the peaks, a kaleidoscope of sounds cascades and seeks to reach infinite heights.
Let us harmonize, mesmerize, and improvise, with linguistic hues that ignite our spirits’ flights.
Additional Lines
The following lines are great for making sure your dataset includes essential sibilant sounds and rare phonemes.
Chefs toss fresh fish with thick salt, chop hot peppers, and pack them swiftly into shiny dishes.
The quick brown fox jumps over the lazy dog while vexing my soul with five dozen liquor jugs.
Manny’s mini van makes many Monday morning trips.
Paul packs purple peppers by the peck.
Vivian finds valuable fossils for fun.
Theo thought thirty-three thieves threw through.
Tod takes time to tackle tough tasks.
Nina’s new neighbor needs nice notes.
Sally sells seashells by the seashore.
Larry likes learning languages late at night.
You yield yellow yarn yearly.
Shea shares shiny shoes on Saturday.
Kara’s cat keeps catching kites.
The singing king brings ringing.
He highlights high, heavy hurdles happily.