Shuo Yang
Shuo Yang
@merrymercy My PR didn't fix the problem, how can we solve it?
@kennymckormick please download v1.1 weight [here](https://huggingface.co/lmsys/vicuna-7b-delta-v1.1) The old weight had no eos_token
> I have the same question even though I download v1.1 weight. v1.1 weight had changed several times, and you can remove your weight and download it again
It might be caused by CUDA OOM, try with: ~~~bash export WORKER_API_EMBEDDING_BATCH_SIZE=1 ~~~ and restart the server & API?
I have found where the problem lies. The max_seq_length of the fake model we specified differs from the actual deployed model. Therefore, langchain did not call 'get safe len' when...
@hyunkelw You are right. You can change the `CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)` to `CharacterTextSplitter(chunk_size=400, chunk_overlap=0)`
@hyunkelw Can you deploy the latest version? The new error message can help me to debug it.
IC, it is caused by cuda oom, your gpu memory is limited 😿 Try ~~~bash export WORKER_API_EMBEDDING_BATCH_SIZE=1 ~~~ and restart API & controller & model worker. If it still doesn't...
Nice work! @RedmiS22018 I encountered one issue when running your code, and I would like to bring it to you: - On Hugging Face, there are two versions of llama...
Hi @RedmiS22018, Thank you for your prompt response! I wanted to provide you with the link to the other version of the [llama weights](https://huggingface.co/decapoda-research/llama-13b-hf) > After the delta files are...