“SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.”
https://github.com/CorentinJ/Real-Time-Voice-Cloning