Nicolas Patry
Nicolas Patry
@RonanKMcGovern embeddings are done through dedicated models here is a leaderboard we have for these: https://huggingface.co/spaces/mteb/leaderboard (Always take leaderboards and benchmark with a pinch of salt, your use case is...
@rahermur @OlivierDehaene is finishing it up, but we're seeing quite nice performance atm and we're leveraging `candle` for maximum performance (embeddings models tend to be small and therefore CPU bottleneck...
Hi, The first step would likely be implementing it like here: https://github.com/huggingface/transformers/issues/22848 within transformers itself. Then, this library has custom modeling code to speed things up, but it's very much...
https://github.com/huggingface/text-generation-inference/pull/514
This is very odd, Segfault should never happen since everything is in safe Rust, this is super odd indeed. Is it possible it could be linked to a special partitionning,...
Here there's no rust being called, everything is pure python. The segfault is extremely weird. In our code my only suspicious thing could be the compiled kernels (which can be...
Adding real rustdoc to the clap args here: https://github.com/huggingface/text-generation-inference/blob/main/benchmark/src/main.rs#L16 Should be plenty enough documentation. It automatically documents the cli itself ( `-h` ), the rustdoc, and for the readme we...
Are you willing to open a PR for it ?
Sorry missed your reaction @Blair-Johnson . I created a PR with a first draft, feel free to ask questions so we can make those even clearer.