ZeroYuJie
ZeroYuJie
> Glad you've found it useful! > > In principle `tokenizer_source: union` should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised...
> I was doing the exact same merge, ended up using stabilityai/japanese-stablelm-base-gamma-7b > > I wanted shisa for the strong Japanese language ability, and OpenHermes for the natural language of...
@sugarme I am using this in my multi-goroutine testing. first i use this func to init model tokenizer, then I initialized a tokenizer within a global variable. the code like...
I've been using the Orion branch from https://github.com/dachengai/vllm and it's running, but there might be issues with outputs in different languages
I encountered a similar issue while using the `NousResearch/Redmond-Puffin-13B ` model, version v1.0.1. During testing with actual concurrent generations, GPU memory usage gradually increases until it reaches a point where...
@Narsil CUDA Version: 12.2 + Centos7 and running in docker
I have also encountered this problem, and I used [efficiency-nodes-comfyui](https://github.com/jags111/efficiency-nodes-comfyui) to load model,I don't know if the two are related.