InvokeAI Save and display per-token attention maps

This pull request enables the display of per-token attention maps after generating an image.

Done:

[x] Collect and return attention maps to generate.py
[x] Pass attention maps and tokens to webUI - currently pushed as the following fields on the object emitted to the webUI socket with generationResult:
- attentionMaps (base64 image, size width/8 x 77*height/8), and
- tokens, see below.

Todo:

[ ] Display maps - @psychedelicious and/or @blessedcoolant will need your help on this part

Typical content of the tokens array is, eg for prompt a fluffy miyazaki dog, an array: ['a</w>', 'fluffy</w>', 'mi', 'yaz', 'aki</w>', 'dog']. The </w> strings represent "end-of-word". With this implementation, to match tokens to fragments of the input prompt text in the input box, the frontend code is going to have to crawl through the prompt and do a best-fit match of these tokens to the prompt string.

Dec 08 '22 19:12 damian0815

Generation error: AttributeError: 'CrossAttention' object has no attribute 'cached_mem_free_total'

Dec 08 '22 20:12 blessedcoolant

this should be merged into main asap, even if that means frontend isn't using it.

attention map collection is always-on in this PR but the memory/performance impact is negligible.

Dec 10 '22 10:12 damian0815