infinity
infinity copied to clipboard
Adding max token budget per batch
Currently allowing up to batch_size=64 as default. This can potentially lead to high memory usage, e.g. for jina-8k bert -> 64x8192. It would be better to adjust dynamically and set a token budget, e.g. 64*512=32768 per forward pass.