1.

Original Video (Must have only one person)

2.

Voice Audio

Text to Speech
3.

Separate Voice from Background Music and Sound

4.

Start Voiceover from Current Video and Audio Time

5.

Lip Sync Model