Nicolas Patry
Nicolas Patry
@kiszk Can you check the last implementation ? I think numpy byteswap can still work, since bf16 byteswap and f16 byteswap are the same. The only thing necessary is those...
@kiszk I didn't say `main` branch, sorry I was referrring to https://github.com/huggingface/safetensors/pull/507
Superseeded by https://github.com/huggingface/safetensors/pull/507 Basically the issue is that previous code was doing `(bf16) -> to -> (f16) -> byteswap -> to -> (bf16)`. `to` is a cast operator which changes...
Fixed in https://github.com/huggingface/safetensors/pull/507
Hi @fpgaminer, Thanks for the workaround. I think this bug should be filed upstream in pytorch since `torch.zeros((2, 0))` is valid, there is no reason for `torch.frombuffer` not to accept...
Make sure you're using the command from the readme: https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#docker It contains important flags.
What's the GPU ? You are using compute_cap 7.5 so I'll guess T4. T4 simply don't have enough vram to run this model out of the box, you can try...
Thanks, we currently don't support it, because to the best of my knowledge there is no flash attention on inferentia, which is an important piece of TGI. We have started...
What kind of GPU is it ? H100 ? I'll look into this and see why it fails on some platforms. I'm guessing the kernels are built against an incompatible...
Could be a duplicate of #739