InvokeAI [bug]: PyTorch "index out of bounds" with long prompts

Is there an existing issue for this?

[X] I have searched the existing issues

OS

Windows

GPU

cuda

VRAM

No response

What happened?

Reported in Discord: https://discord.com/channels/1020123559063990373/1052782005508649030

File "C:\Users\NeO\invokeai\.venv\lib\site-packages\ldm\models\diffusion\cross_attention_map_saving.py", line 39, in add_attention_maps
    self.collated_maps[key_and_size] += maps.cpu()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [47,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.`

Another user responds:

i have this issue as well it seems to only happen when i put in lengthy prompts

Screenshots

No response

Additional context

No response

Contact Details

No response

Dec 18 '22 21:12 ebr

If this is related to long prompts, I believe should be fixed in #1999

Dec 20 '22 01:12 hipsterusername

In version 2.2.5 it doesn't crash, but still prints:

>> Prompt is 4 token(s) too long and has been truncated
>> Prompt is 3 token(s) too long and has been truncated
>> Prompt is 2 token(s) too long and has been truncated
>> Prompt is 3 token(s) too long and has been truncated
>> Prompt is 2 token(s) too long and has been truncated
>> Prompt is 2 token(s) too long and has been truncated
>> Prompt is 2 token(s) too long and has been truncated
>> Prompt is 3 token(s) too long and has been truncated
>> Prompt is 353 token(s) too long and has been truncated

If you use too long prompt. 353 tokens - it's extremely long Negative Prompt :) And, according to that warning, it looks like it's just ignoring some prompt words?

Dec 26 '22 18:12 berkut1

Working as intended.

Dec 26 '22 19:12 hipsterusername

~~Okay, I did some tests,~~ ~~There can be a maximum of 50 words in brackets (square brackets, but probably in any of them), any more will be ignored and you will get this warning.~~ too long and has been truncated P.S Seems like there is just a limitation of prompt at all :) https://github.com/invoke-ai/InvokeAI/blob/7d8d4bcafb46b7e8309f7568815233c4b7a170e9/ldm/modules/encoders/modules.py#L243

Dec 26 '22 19:12 berkut1

https://github.com/invoke-ai/InvokeAI/blob/7d8d4bcafb46b7e8309f7568815233c4b7a170e9/ldm/modules/encoders/modules.py#L243 max_length=77,

I wonder is this an artificial limitation for all models, or just the default for the default Stable Diffusion models? In Automatic1111's web UI, the limit automatically expands, and nothing is ever mentioned about truncating. With random comments saying that allegedly, custom models can have much longer prompts.

A clarification and option to extend the prompt length for custom models would be then great, thank you!

Jan 06 '23 06:01 Testertime

https://github.com/invoke-ai/InvokeAI/blob/7d8d4bcafb46b7e8309f7568815233c4b7a170e9/ldm/modules/encoders/modules.py#L243

max_length=77,

I wonder is this an artificial limitation for all models, or just the default for the default Stable Diffusion models? In Automatic1111's web UI, the limit automatically expands, and nothing is ever mentioned about truncating. With random comments saying that allegedly, custom models can have much longer prompts.

A clarification and option to extend the prompt length for custom models would be then great, thank you!

@Testertime there was a topic, Invoke-AI can't expand limits because the original code of that was stolen. And I dunno, why contributions can't use that method https://en.wikipedia.org/wiki/Clean_room_design

Jan 07 '23 05:01 berkut1

To clarify:

A1111 has the same limit as InvokeAI. This is a fundamental limit baked into underlying generation process, not an arbitrary limit that Invoke has imposed.
The reason users do not believe there is a limit in A1111 is that prompts are blended without user guidance, to keep each portion of the prompt under the limit.
in effect, this would be similar to doing (“75 tokens”,”75 tokens”).blend(1,1) using our blending syntax. Manually controlling that process is better, in our view.
Allegedly stolen code is related to hypernetworks, not this.

Jan 07 '23 12:01 hipsterusername

The reason users do not believe there is a limit in A1111 is that prompts are blended without user guidance, to keep each portion of the prompt under the limit.

This is noticeable by showing the limits increasing (dividing into a group) when you use more than 75 tokens, and this blending is what calling UX and what users expect. In our implementation you show nothing in UI and only show less understandable warning in terminal without info how to workaround of it.

in effect, this would be similar to doing (“75 tokens”,”75 tokens”).blend(1,1) using our blending syntax. Manually controlling that process is better, in our view.

It's called bad UX. :)

Jan 07 '23 21:01 berkut1

The reason users do not believe there is a limit in A1111 is that prompts are blended without user guidance, to keep each portion of the prompt under the limit.

This is noticeable by showing the limits increasing (dividing into a group) when you use more than 75 tokens, and this blending is what calling UX and what users expect. In our implementation you show nothing in UI and only show less understandable warning in terminal without info how to workaround of it.

in effect, this would be similar to doing (“75 tokens”,”75 tokens”).blend(1,1) using our blending syntax. Manually controlling that process is better, in our view.

It's called bad UX. :)

You’re welcome to your opinion!

We’ll let users tell us which UX they prefer once our prompting UI is built.

Jan 07 '23 22:01 hipsterusername

We’ll let users tell us which UX they prefer once our prompting UI is built.

This works if there is no alternative, but unfortunately there are :), but I like how you implemented canvas and inpainting modes, they are more UX better than a1111. And probably mostly users use apps in the same way, - generate images in a1111 and then work with them deeply with your some features.

Jan 07 '23 23:01 berkut1