Save and display per-token attention maps
This pull request enables the display of per-token attention maps after generating an image.
Done:
- [x] Collect and return attention maps to generate.py
- [x] Pass attention maps and tokens to webUI - currently pushed as the following fields on the object emitted to the webUI socket with
generationResult:attentionMaps(base64 image, sizewidth/8 x 77*height/8), andtokens, see below.
Todo:
- [ ] Display maps - @psychedelicious and/or @blessedcoolant will need your help on this part
Typical content of the tokens array is, eg for prompt a fluffy miyazaki dog, an array: ['a</w>', 'fluffy</w>', 'mi', 'yaz', 'aki</w>', 'dog']. The </w> strings represent "end-of-word". With this implementation, to match tokens to fragments of the input prompt text in the input box, the frontend code is going to have to crawl through the prompt and do a best-fit match of these tokens to the prompt string.
Generation error: AttributeError: 'CrossAttention' object has no attribute 'cached_mem_free_total'
this should be merged into main asap, even if that means frontend isn't using it.
attention map collection is always-on in this PR but the memory/performance impact is negligible.