infinity icon indicating copy to clipboard operation
infinity copied to clipboard

Idea: add a parameter to configure number of decimals in JSON output

Open lasttero opened this issue 1 year ago • 3 comments

Please consider adding a parameter to set the number of decimals in the Json output. This would be beneficial to reduce network bandwidth requirements and the time for parsing the output. This is relevant for users who do not need/want full accuracy e.g. is the embedding values are quantized and/or have a latency critical applications.

lasttero avatar Jan 17 '24 02:01 lasttero

Good idea, I assume as the payload is stringified and sent as payload.

On the other hand, json encoding took around 20% of the CPU, in some cases was responsible for up to half the share of latency time. I solved the issue by switching to orjson. I do not think that https://github.com/ijl/orjson supports such a feature.

So pro:

  • no need for more than 4/6 digits, might reduce latency. will reduce network usage marginally (if that's a bottleneck)

Con:

  • no implementation available in orjson afaik
  • switching to a different
  • additional source of error

michaelfeil avatar Jan 17 '24 13:01 michaelfeil

Thank you for responding quickly. Inspired by the comment above I realized I had a sub-optimal implementation for JSON parsing, and replaced it with hand-coded parser for the fastest processing. It would be beneficial to have this, but not anymore critical. Backgrounder: we run a number of infinity processes locally on the same GPU (as that seem to stochastically interleave GPU usage to maximize GPU utilization and total throughput). Again, thank you for the convenient application.

lasttero avatar Jan 18 '24 04:01 lasttero

I slightly optimized queueing - I don't think the decimals in the json would significantly influence the throughput.

michaelfeil avatar Feb 01 '24 23:02 michaelfeil