Nicolas Patry comments

Results 977 comments of


                                            Nicolas Patry

[Tentative] Adding 192 head dim (step_size = 12)

Are the tests ran anywhere ?

Fixed Length Pre-Tokenizer

Thanks for this. The code looks working, but I think it could be simplified quite a lot. Is there any source/paper for trying to do fixed sized chunking ? Before...

Add FP8 KVCache support

Closing this as we added support for FP8 kv cache support in https://github.com/huggingface/text-generation-inference/pull/2603. More support is coming (for pre-scaled kv-cache fp8)

add uploading support to the library!

We don't want to copy the python code here. This is Rust-land, the goal is to stick to the simplest possible thing. For instance `from_str` is very unrusty. Having real...

Make it easier to support multiple versions

Thanks, but that doesn't apply to the `abi3-pyxx` features, does it ? Here it would something like that, but more a way to switch features in `pyproject.toml` based on the...

Image in huggingface docs has typo

Thanks for this

> #[pyo3(signature = (url, filename, max_files, chunk_size, parallel_failures=0, max_retries=0, headers=None, callback=None))] With max_files and chunk_size you should be able to throttle this. default is 100 files and 10MB chunk size

Set throttle limit?

`hf_xet` uses a transfer protocol based on `hf_transfer` so the control should be the same, but yes please report to `hf_xet` if you're using it.

Update Dockerfile to use devel image for compatibility

As previously suggested, the fix cannot be accepted as-is. It bloats the image way too much (20GB vs 12GB). First we need to reproduce locally, then figure out why the...