1.

Original Video (One person only)

2.

Voice Audio

Text to Speech
3.

Separate Voice from Background Music and Sound

4.

Start Voiceover from Current Video and Audio Time

5.

Lip Sync Model