Tanjiro
Tanjiro
I am GSSOC'21 Participant. Can you assign me this issue?
PR Generated #64
Any updates on this, I am also facing the same buffer size issue
I have corrected this by replacing my stop token to None.
Opened a PR #1535 for this
@ncomly-nvidia kindly review this PR
This has been merged, we can close this issue
I am getting the same issue when trying speculative decoding (medusa) with vicuna, after some inference, it is getting buffer size exceeds 2560
I will take this up. @matankley assign it to me
Almost, done with implementation, during testing, I observed HuggingFace LLM unlike OpenAI models will be on the local system, which need to be initialized every time a function will run,...