Nicolas Patry
Nicolas Patry
Thanks for the kind words. Asking for `max_new_tokens` all the time will mean the router will consider a lot of tokens for that particular query, meaning it will less be...
I don't know what it could be. The first load is fast, then subsequent loads are slow. This is odd indeed, since normally it should be the other way around...
> So is that something you can fix in safetensors or do we need some option in webui to allow alternative loading method? Unfortunately, this might be a WSL/Windows things,...
> set SAFETENSORS_FAST_GPU=1 This one shouldn't have any effect for version > `0.3.0` anymore... odd.
The issue seems to stem from WSL and memory mapping not playing along very well: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11216 Can you confirm ?
Does item 2 from here https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11216#issuecomment-1593378136 help ? If so it's definitely a memory map issue, but what's really odd is that I'm never able to reproduce it (I'm using...
Very odd, the version is indeed `11.8` on the Dockerfile for 0.9.4 : https://github.com/huggingface/text-generation-inference/blob/v0.9.4/Dockerfile#L44
> huggingface/text-generation-inference:0.9.1 Try using actually `0.9.4` ?
The error means that you're trying to load a cuda kernel that was compiled with a different version. I'm going to try and confirm this.
Hmm I'm confused. I indeed see : ``` >>> torch.version.cuda '11.7' ``` However the build script definitely says it's asking for 11.8... I'm going to stop for today, if you...