Nicolas Patry

Results 978 comments of Nicolas Patry

Thanks for the kind words. Asking for `max_new_tokens` all the time will mean the router will consider a lot of tokens for that particular query, meaning it will less be...

I don't know what it could be. The first load is fast, then subsequent loads are slow. This is odd indeed, since normally it should be the other way around...

> So is that something you can fix in safetensors or do we need some option in webui to allow alternative loading method? Unfortunately, this might be a WSL/Windows things,...

> set SAFETENSORS_FAST_GPU=1 This one shouldn't have any effect for version > `0.3.0` anymore... odd.

The issue seems to stem from WSL and memory mapping not playing along very well: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11216 Can you confirm ?

Does item 2 from here https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11216#issuecomment-1593378136 help ? If so it's definitely a memory map issue, but what's really odd is that I'm never able to reproduce it (I'm using...

Very odd, the version is indeed `11.8` on the Dockerfile for 0.9.4 : https://github.com/huggingface/text-generation-inference/blob/v0.9.4/Dockerfile#L44

> huggingface/text-generation-inference:0.9.1 Try using actually `0.9.4` ?

The error means that you're trying to load a cuda kernel that was compiled with a different version. I'm going to try and confirm this.

Hmm I'm confused. I indeed see : ``` >>> torch.version.cuda '11.7' ``` However the build script definitely says it's asking for 11.8... I'm going to stop for today, if you...