VoiceCraft icon indicating copy to clipboard operation
VoiceCraft copied to clipboard

Add standalone python scripts for local usage

Open pgosar opened this issue 1 year ago • 9 comments
trafficstars

Work in progress to create a python script to run inference for speech editing and TTS that is separate from Jupyter

Will handle #56

TODO

  • [x] - Add command line arguments for all hardcoded options on TTS
  • [x] - Complete and test TTS
  • [x] - Add command line arguments for all hardcoded options on speech editing
  • [x] - Complete and test speech editing
  • [x] - cleanup, and add running instructions

pgosar avatar Apr 17 '24 21:04 pgosar

Did you see https://github.com/jasonppy/VoiceCraft/pull/34 ?

arthurwolf avatar Apr 17 '24 22:04 arthurwolf

This is planned to supersede that since I'd like to avoid attempting to do environment setup in the script itself. I also want to provide scripts for both speech editing and TTS. Will start after I finish my current PR

pgosar avatar Apr 17 '24 22:04 pgosar

Great, just wanted to be sure you knew about it / would re-use anything that's useful if you can/want.

Good luck on your work.

On Thu, Apr 18, 2024 at 12:15 AM Pranay Gosar @.***> wrote:

This is planned to supersede that since I'd like to avoid attempting to do environment setup in the script itself. I also want to provide scripts for both speech editing and TTS. Will start after I finish my current PR

— Reply to this email directly, view it on GitHub https://github.com/jasonppy/VoiceCraft/pull/95#issuecomment-2062545923, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2SFJ42ORXK25QPUPKDBLY53X7RAVCNFSM6AAAAABGMCZJY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRSGU2DKOJSGM . You are receiving this because you commented.Message ID: @.***>

--

勇気とユーモア

arthurwolf avatar Apr 17 '24 22:04 arthurwolf

Did you see #34 ?

Definitely feel free to use whatever you can from this to save yourself work or time! Unfortunately, I got in a spot where I couldn't dedicate more time to the script I put up. I kept getting errors with audiocraft not being found when running it and, unfortunately, Python isn't my forte, so I wasn't sure how to rectify that between the parent environment and the inner Conda environment.

You should still be able to reuse the environment setup stuff if you want especially installing Python modules conditionally and the pip stuff. I know @pgosar mentioned not doing setup stuff, but just an idea to throw out there you could put behind a --install-deps flag or something. Best of luck!

jstayco avatar Apr 23 '24 01:04 jstayco

PR should be functional now, tomorrow I will take a pass through and clean up the code a little and make sure I didn't miss any potential breakages.

Every hardcoded variable concerning inference, outputs, inputs etc. has been turned into a command line argument. They are all optional. The default values are whatever they were set to originally.

This should be merged in before my other PR #94 because I'll need to make changes to the speech editing script based on those changes

pgosar avatar Apr 23 '24 23:04 pgosar

I kept getting errors with audiocraft not being found

I don't know if this is your exact issue but when I wrote the Google Colabs I had to clone Audiocraft into the VoiceCraft folder. Regardless, my scripts work without any special environment setup beyond what's in the README currently.

pgosar avatar Apr 24 '24 00:04 pgosar

@jasonppy Hi, I should be ready on my side

pgosar avatar Apr 24 '24 22:04 pgosar

@jasonppy Hi, I should be ready on my side

Thanks, I'll test it in the next two days

jasonppy avatar Apr 25 '24 00:04 jasonppy

I'll take a look at these in a day or two

pgosar avatar Apr 30 '24 00:04 pgosar

Sorry for the delay - had to complete my final projects/exams.

I implemented the `find_closest_word_boundary such that based on the specified cut off seconds, it outputs a new one that takes into account the margins. This then means that based on your suggestion about the target_transcript, the user should be able to input only what new speech they want to generate, and the cut off point of the original audio to replace.

I'm a little confused, is the behavior you want that the user can specify a target transcript only and then the script will figure out the cut off seconds? That would be quite easy to adjust my current implementation to do - all I'd need to do is search for the last matching word between the original and target transcript and set that point as the cut_off_sec and cut_off_index instead.

pgosar avatar May 04 '24 03:05 pgosar