In generate.py , prompt_length > 4096(generate_long), lead to max_new_tokens< 0.
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
I have texts, eg: ["辉瑞制药又一次敏锐地捕捉到了时代的脉搏。", "1951年,辉瑞制药再次取得了一项重大的科研突破。"],
When llama cannot predict stop sign 4 on texts[0]. Now, [generate.py 539](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#:~:text=538-,539,-540), decoded is a vector with nearly 4096 dimensions. so that, max_new_tokens = T_new - T <0.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Your reference audio length should not exceed 90 seconds.
Your reference audio length should not exceed 90 seconds.
Thanks for your response.
I use 23s of reference audio。In decode_n_tokens,I find that stop sign is not predicted. The dimensions of the token are as follows:
.
I forgot to say it above, this problem was found without using multinomial sampling,
def multinomial_sample_one_no_sync(
probs_sort,
): # Does multinomial sampling without a cuda synchronization
# q = torch.empty_like(probs_sort).exponential_(1)
# return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
return torch.argmax(probs_sort, dim=-1, keepdim=True).to(dtype=torch.int)
My reference audio coding features: ref.zip
You 100% need multinomial sampling, argmax will cause repetition pattern.
You 100% need multinomial sampling, argmax will cause repetition pattern.
The generated audio waveform is indeed repetitive noise at the back. But I don't know why it keeps repeating, if increasing the training data will improve the repetition problem?
Using greedy for any LLM can also meet same issue.
Using greedy for any LLM can also meet same issue.
Thanks。