hsb1995
hsb1995
 就算我成功运行这个也会报以上错误,劳烦作者解答!不甚感激
其他我都解决了,大哥,就剩那个我最后问您的,端口转发怎么操作呢?
 Is this a code error? Why is the downloaded code: modeling_mistral and MistralMLP
Package Version ------------------------ ------------ accelerate 0.29.1 aiohttp 3.9.3 aiosignal 1.3.1 appdirs 1.4.4 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.43.0 black 24.3.0 Brotli 1.1.0 certifi 2022.12.7 charset-normalizer 2.1.1 click 8.1.7...
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3630 G /usr/lib/xorg/Xorg 4MiB |...
 My small weight can be calculated, but when it comes to large weight, there is a problem.
 My code runs on dual 3090,Please ask the author to help take a look.
Have you implemented this yet? Can you share a wave?
w=16,a=16 I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set(w=6,a=6), problems arise 
> @hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’. Professor, thank you for your full work. I really don't know how **GQA** is handled...