Prepare a Dataset
Recording High-Quality Datasets
30-60 minutes of clean and varied audio will result in the highest-quality voice models.
Clean audio
The highest-quality voice models are recorded:
-
with a quality microphone into an audio interface
And processed:
-
with consistent dynamics across the whole dataset
-
with light EQ to remove any muddiness, hiss, etc.
-
with compression/limiting to smooth out peaks
-
with no reverb, delay, or doubling