Nicolas Patry
Nicolas Patry
Are the tests ran anywhere ?
Thanks for this. The code looks working, but I think it could be simplified quite a lot. Is there any source/paper for trying to do fixed sized chunking ? Before...
Happy to help with the rebase btw.
Closing this as we added support for FP8 kv cache support in https://github.com/huggingface/text-generation-inference/pull/2603. More support is coming (for pre-scaled kv-cache fp8)
We don't want to copy the python code here. This is Rust-land, the goal is to stick to the simplest possible thing. For instance `from_str` is very unrusty. Having real...
Thanks, but that doesn't apply to the `abi3-pyxx` features, does it ? Here it would something like that, but more a way to switch features in `pyproject.toml` based on the...
Thanks for this
> #[pyo3(signature = (url, filename, max_files, chunk_size, parallel_failures=0, max_retries=0, headers=None, callback=None))] With max_files and chunk_size you should be able to throttle this. default is 100 files and 10MB chunk size
`hf_xet` uses a transfer protocol based on `hf_transfer` so the control should be the same, but yes please report to `hf_xet` if you're using it.
As previously suggested, the fix cannot be accepted as-is. It bloats the image way too much (20GB vs 12GB). First we need to reproduce locally, then figure out why the...