What to include in your dataset
Collect about 20 minutes of confident singing examples in the style you want to clone. Then include an additional 10 minutes including examples of low notes, high notes, isolated phonemes, and sibilant sounds to make sure you’ve covered every possible sound. You should have 30+ minutes of content. Your AI voice can only accurately reproduce what it hears in your dataset, so include as many words and pitches as possible. And whether it’s audio glitches or singing mistakes, don’t include anything you wouldn’t want your AI voice to learn.- Voice cloning requires monophonic input, so do not include any vocal stacks or harmonies.
- Record your dataset in a single session so your AI voice can capture the exact frequency response of the target voice. Combining different recordings can reduce the accuracy of the cloning process.
- Decide what vocal quality you would like to clone. For example, if you want your AI voice to have a gritty character when singing high notes, don’t include falsetto vocals in the same pitch range as gritty vocals.