Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

Fix byteswap for BF16 tensor on big-endian machine

@kiszk Can you check the last implementation ? I think numpy byteswap can still work, since bf16 byteswap and f16 byteswap are the same. The only thing necessary is those...

Fix byteswap for BF16 tensor on big-endian machine

@kiszk I didn't say `main` branch, sorry I was referrring to https://github.com/huggingface/safetensors/pull/507

Fix byteswap for BF16 tensor on big-endian machine

Superseeded by https://github.com/huggingface/safetensors/pull/507 Basically the issue is that previous code was doing `(bf16) -> to -> (f16) -> byteswap -> to -> (bf16)`. `to` is a cast operator which changes...

Load incorrect element values in bfloat16 tensor on big-endian

Fixed in https://github.com/huggingface/safetensors/pull/507

Failed to load `.safetensors` as state dict with error from `torch.frombuffer` in `safetensors.torch.load`

Hi @fpgaminer, Thanks for the workaround. I think this bug should be filed upstream in pytorch since `torch.zeros((2, 0))` is valid, there is no reason for `torch.frombuffer` not to accept...

when run llama using TGI, Server error

Make sure you're using the command from the readme: https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#docker It contains important flags.

"torch.cuda.OutOfMemoryError: CUDA out of memory" when deploy LLM with TGI in Kubernetes cluster

What's the GPU ? You are using compute_cap 7.5 so I'll guess T4. T4 simply don't have enough vram to run this model out of the box, you can try...

AWS Inferentia (inf1, inf2) support

Thanks, we currently don't support it, because to the best of my knowledge there is no flash attention on inferentia, which is an important piece of TGI. We have started...

converting docker images to singularity

What kind of GPU is it ? H100 ? I'll look into this and see why it fails on some platforms. I'm guessing the kernels are built against an incompatible...

converting docker images to singularity

Could be a duplicate of #739