open_clip
open_clip copied to clipboard
[WIP] Allow for attention caching during CoCa generation
Currently allows for users to set caching
argument in model.generate() in order to improve longer generation efficiency. At smaller lengths, the overhead doesn't seem to be worth the cost of caching, but may be an implementation problem.
- [ ] Test speed-up for longer generations
- [ ] If possible, add caching for text encoding