Kwan Kin Chan
Kwan Kin Chan
Hi, I tried to load the model with dual 4090 and still faced the same error after applying the changes. I looked into debugger and realized that it is because...
I added the `use_cache = False` at https://github.com/OpenGVLab/Ask-Anything/blob/078540aaebfbe1ad9a109020a73b0ce173b355ef/video_chat2/conversation.py#L64-L75 and I get a new error message. ``` Exception has occurred: RuntimeError shape '[-1, 125]' is invalid for input of size 126...
yes, I was able to run the model without `flash_attn`. However, I am trying flash attention because I want a faster and more memory-efficient inference when using long prompts. Apart...