jukebox
jukebox copied to clipboard
Question about parameters
I couldn't understand exactly what is the meaning of the parameters when I am running in --mode=primed: Suppose I have 4 input audio files to be used for priming. Does the code concatenates all 4 files into one?
Can you explain what exactly is meaning of the following parameters --prompt_length_in_seconds --sample_length_in_seconds --total_sample_length_in_seconds --n_samples
--prompt_length_in_seconds: Amount of seconds you want to use for your file. You can only use one file. --sample_length_in_seconds: Amount of seconds you want to generate. this includes the prompt length. --total_sample_length_in_seconds: Intended amount of seconds the song (not generated file) would be. This works if you want your generation to end mid-song. --n_samples: Number of samples (files) you want to generate. If you are using the 5b models, 3 is ideal. If you are using the 1b model, you can create up to 16.
I dont understand what do you mean by "Intended amount of seconds the song should be (not generated)?
Can you answer the following: Suppose I am only feeding it with one input audio file and say it is 4 minutes song. Suppose I want: -10 seconds to be used as input for inference -to generate three samples of 25 sec each
If I understand then I should let: --prompt_length_in_seconds = 10 sample_length_in_seconds = 25 --n_samples = 3
Is this correct?
Now, why do I need to bother with: total_sample_length_in_second
s? Can you explain?
My understanding is that Jukebox has an impression of the different sort of sounds that appear at the beginning of a song, versus the sound of the middle of the song, versus a song's ending.
So, if sample_length_in_seconds
and total_sample_length_in_seconds
are both set to 25, then Jukebox will give you 25 seconds of audio during which Jukebox will attempt to compose an entire song (with beginning, middle, and ending) that lasts only 25 seconds. It will very quickly move through Jukebox's concept of a musical beginning, middle, and end. Perhaps it might be particularly influenced by whatever few examples of sub-thirty-second songs that might have existed in the training set.
If instead sample_length_in_seconds = 25
and total_sample_length_in_seconds = 180
, then Jukebox will generate the first 25 seconds of a three-minute-long song. This means the generated sample will consist of an intro as well as maybe a bit of a "first verse", if it's been asked to make something with the structure of a basic pop song.