sun1092469590 comments

Results 28 comments of


                                            sun1092469590

Error when using Qwen-14B

> Will download the model and try and reproduce this, but I'm noticing that `trust_remote_code=True` is not added in the `AutoModelForCausalLM.from_pretrained`, which means that the model should not be loaded...

Error when using Qwen-14B

thank you very much. my current transformers version is also 4.34.0 and I can run QWen-14B normaly when attention_sink is not added. ![7bb2533b13469abc3e4d8da3e71316a](https://github.com/tomaarsen/attention_sinks/assets/19388387/26391c44-f789-4f2e-be39-fdf27d2da664) ![958d65a94de3a3210ba7bd9d2b9bd43](https://github.com/tomaarsen/attention_sinks/assets/19388387/548c9c8b-1d5e-43c1-8cb3-8beccd7305fb) ![142019338f1b62a5869b09b3de2a6a4](https://github.com/tomaarsen/attention_sinks/assets/19388387/53128241-4115-4679-a0a1-e56d5d35aa31) ![101ec095115660499df401fac840ddc](https://github.com/tomaarsen/attention_sinks/assets/19388387/32cb85cb-e65f-4be8-a338-67c47ffc0d16)

Error when using Qwen-14B

thank you very much for your detailed answer. I will firstly try you method and if does not work I will stop use Flash Attention and test.

Error when using Qwen-14B

1) I stop use Flash Attention by add parameter "use_flash_attn=False" in AutoModelForCausalLM.from_pretrained(), and result is normal as you show me. As is : model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16, attention_sink_size=4,...