open_clip [WIP] Allow for attention caching during CoCa generation

[WIP] Allow for attention caching during CoCa generation

Open sramshetty opened this issue 1 year ago • 0 comments

Currently allows for users to set caching argument in model.generate() in order to improve longer generation efficiency. At smaller lengths, the overhead doesn't seem to be worth the cost of caching, but may be an implementation problem.

[ ] Test speed-up for longer generations
[ ] If possible, add caching for text encoding

Apr 20 '23 04:04 sramshetty

open_clip open_clip copied to clipboard

[WIP] Allow for attention caching during CoCa generation

open_clip
open_clip copied to clipboard