VoiceCraft icon indicating copy to clipboard operation
VoiceCraft copied to clipboard

gradio port

Open friendlyFriend4000 opened this issue 1 year ago • 5 comments

I did not like having to mess with jupyter and having to run whisper separately, so I made a gradio version. Will submit pull request eventually. you can try it out here for now. note that the conda env is slightly different in my fork https://github.com/friendlyFriend4000/VoiceCraft

image

friendlyFriend4000 avatar Mar 31 '24 04:03 friendlyFriend4000

Thanks! Looking forward to it!

jasonppy avatar Mar 31 '24 21:03 jasonppy

Can you make a Colab Notebook for this?

ghost avatar Apr 01 '24 07:04 ghost

@friendlyFriend4000 thanks for doing this! I have actually problems in using it (ubuntu 22-04). I get errors like:

WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)

RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

I am happy in collaborating with you to test and sort out problems like this! Maybe we need to prepare some more instructions about installing and using is ?

JMLLR1 avatar Apr 01 '24 14:04 JMLLR1

@friendlyFriend4000 thanks for doing this! I have actually problems in using it (ubuntu 22-04). I get errors like:

WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)

RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

I am happy in collaborating with you to test and sort out problems like this! Maybe we need to prepare some more instructions about installing and using is ?

you can ignore the first error. For the second one I think that your cut off timing slightly beyond the 'expected' transcript length. try decreasing the cut off timing by a couple miliseconds

friendlyFriend4000 avatar Apr 01 '24 16:04 friendlyFriend4000

Sorry to bother you again, but the "Output Audio generated" is between 0 to 2 seconds and just scrambled words. probably I am the problem and not doing something i shold do..

rotatorotator avatar Apr 02 '24 12:04 rotatorotator

Sorry to bother you again, but the "Output Audio generated" is between 0 to 2 seconds and just scrambled words. probably I am the problem and not doing something i shold do..

there is a complete better version of a gradio implementation on a pr right now

friendlyFriend4000 avatar Apr 03 '24 22:04 friendlyFriend4000