sgpt
sgpt copied to clipboard
If I input more than the max_seq_length?
I see that sgpt-bloom-7b1-mamarco model has a vector length of 300,but
If I input more than the maximum length, for example, input more than 400 Chinese characters, it seems that it can also be embedded in the vector, but it seems that the increase to more than 500 will not affect the vector calculation results.
Can I enter a maximum Chinese character of 500?
Yes you can input more characters. The calculation may not be affected because you need to change the max_sequence_length
- Check this issue: https://github.com/Muennighoff/sgpt/issues/23#issuecomment-1486379896
If it still does not work, please provide the exact code you are using.
Thanks for the reply. My understanding is that because the model is trained with 300 tokens, if we change the input length, for example, to 500, the effect may be similar, but if the increase is larger, it is not possible to have a bad effect, because the training sample is not so long 🤔
Thanks for the reply. My understanding is that because the model is trained with 300 tokens, if we change the input length, for example, to 500, the effect may be similar, but if the increase is larger, it is not possible to have a bad effect, because the training sample is not so long 🤔
Yeah, it'd be really interesting to know how performance is at longer sequences. If you run any experiments and have any data on how it performs, would be amazing if you could share it 🚀
Thanks for the reply. My understanding is that because the model is trained with 300 tokens, if we change the input length, for example, to 500, the effect may be similar, but if the increase is larger, it is not possible to have a bad effect, because the training sample is not so long 🤔
Yeah, it'd be really interesting to know how performance is at longer sequences. If you run any experiments and have any data on how it performs, would be amazing if you could share it 🚀
Thanks you!