infinity icon indicating copy to clipboard operation
infinity copied to clipboard

Adding max token budget per batch

Open michaelfeil opened this issue 1 year ago • 0 comments

Currently allowing up to batch_size=64 as default. This can potentially lead to high memory usage, e.g. for jina-8k bert -> 64x8192. It would be better to adjust dynamically and set a token budget, e.g. 64*512=32768 per forward pass.

michaelfeil avatar Feb 05 '24 07:02 michaelfeil