Properly return usage information for BedrockClientV2 for the Reranker model?
It would be very useful to have a number of "used billable units" since now I need to implement some complicated (and cpu-intensive) logic to tokenize the queries, tokenize each document; and count how many billable requests will be used by a given request based on the amount and length of documents.
Hi @billytrend-cohere , if at all possible, could you indicate whether this is planned and perhaps give a rough ETA? We found that doing the tokenization ourselves is indeed quite resource-intensive at scale (and increased memory usage quite significantly); I need to decide whether it makes sense to try and optimize it on our end or wait for it to be supported.
Hey @ldorigo Looking into this for you! you can use our platform to do free tokenisation if you need to be unblocked sooner. just set offline=False on the tokenise call