splade
splade copied to clipboard
Inference Experiments
Hey all,
I'm looking at the Efficiency Study paper and I'd like to replicate the query encoding numbers - could you please provide a pipeline or any other pointers so I can ensure my measurement is correct?
Thanks a lot!
Hey Joel,
So at the time I think I basically tokenized everything (without taking the time into account) and then ran it a query at the time with a single CPU core (set with SLURM). I can try spinning a similar pipeline, but would love to hear your thoughts.
For later papers I started using a benchmarker from huggingface, but I cannot find it right now. I can try digging deeper if needed.
Thanks Carlos! I was looking for the test setup for measuring inference latency for queries so I could replicate your numbers, but I think I can manage without it if you don't have it. I'll have a dig on HF and see if I can find anything. Thanks again!