Wei Wei
Wei Wei
This is being worked on by @kinarr et al. It is not hard but does need to revert the JAX sharding scheme back to Gemma v1 when it first came...
Yeah, I realized that, since we are feeding token in one by one. However, for some reason it's not working as expected. I'll try to provide a repro.
Here is the notebook: https://colab.research.google.com/drive/1kk7xcFSA7KzVQnekfqmdd1Gq_Z4qsLvU#scrollTo=NIOXoY1xgiww Turning on KV cache makes it so much slower, which doesn't make any sense to me :(
You need to sign CLA. Could you provide more details on 'tool choice'?
Closed as stale