splade icon indicating copy to clipboard operation
splade copied to clipboard

Inference Experiments

Open JMMackenzie opened this issue 11 months ago • 2 comments

Hey all,

I'm looking at the Efficiency Study paper and I'd like to replicate the query encoding numbers - could you please provide a pipeline or any other pointers so I can ensure my measurement is correct?

Thanks a lot!

JMMackenzie avatar Mar 11 '24 04:03 JMMackenzie

Hey Joel,

So at the time I think I basically tokenized everything (without taking the time into account) and then ran it a query at the time with a single CPU core (set with SLURM). I can try spinning a similar pipeline, but would love to hear your thoughts.

For later papers I started using a benchmarker from huggingface, but I cannot find it right now. I can try digging deeper if needed.

carlos-lassance avatar Mar 14 '24 16:03 carlos-lassance

Thanks Carlos! I was looking for the test setup for measuring inference latency for queries so I could replicate your numbers, but I think I can manage without it if you don't have it. I'll have a dig on HF and see if I can find anything. Thanks again!

JMMackenzie avatar Mar 24 '24 23:03 JMMackenzie