openlimit
openlimit copied to clipboard
Is the token limit check correct?
For example 40k TPM is converted to 666.66 TPS I've a summarization usecase where the incoming token is 800+, in which case the logic seems to wait indefinitely. Has anyone run into this or can confirm if my reading is correct or not?