wang jiahao issues

Results 3 issues of


                                            wang jiahao

关于mix参数问题

作者您好，我在看您的代码时，有个地方很疑惑，在Encoder和Decoder中共有三层attention层，假设都采用您写的FullAttention模块，为什么Decoder中的第一个attention层要设置mix为True？这个参数我发现主要用于后面将多头合并成一个头的时候，将数据的最后两个维度交换，我不理解这里为什么要这么做，我在修改为False之后，效果变差了，想问下这个地方的原因是什么？

Can I run cloudsuite benchmark on arm64 architecture directly by offical code?

I have tried all the benchmarks of cloudsuite for arm64. There are a lot of problems. almost all the benchmark can not run by offical code. Web-serving have php-fpm problem....

Eval bug: Error when converting moonlight from bf16 to q4km

### Name and Version /build/bin/llama-quantize /mnt/data/model/Moonlight-16B-A3B-Instruct/Moonlight-16B-A3B-Instruct-BF16.gguf /mnt/data/model/Moonlight-16B-A3B-Instruct/Moonlight-16B-A3B-Instruct-Q4_K_M.gguf Q4_K_M ### Operating systems Linux ### GGML backends CUDA ### Hardware RTX 4090D ### Models Moonlight-16B-A3B-Instruct ### Problem description & steps to reproduce...

bug-unconfirmed