Zhihua.Liu issues

Results 7 issues of


Zhihua.Liu

About the experiment of MWPBert on math23k

I got 'value accu=40.0' and found that the model uses 'bert-base-uncased' as the encoder by default. Could the reason be that I were not using a Chinese bert for math23k?...

pytorch model & ring attention

Thanks for sharing this excellent great work. We want to use pytorch models to try the effect of ring attention. Are there any plans to develop ring attention implementation under...

请问如何能获得论文中使用的原始验证集和训练集？

我发现数据预处理的代码中，训练集和验证集的划分方式是用了一条random.random()

model for ruber

I noticed that the evaluation method "RUBER" only provides code for testing/training. Could you please share the weight of Ruber's model or the data used for training? I would be...

About memory missing location information

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

large memory usage

![image](https://github.com/zhuzilin/ring-flash-attention/assets/17453999/57e20774-f48c-47a6-8208-b97a23928b17) Thanks for sharing this excellent implementation of ring attention. Here are my test results on 2*A100 (with nvlink). Judging from the results, the memory usage of ring attention（ring_flash_attn_qkvpacked_func） seems...

Test results are not as expected

After running the provided test script I got the following results： ![image](https://github.com/RulinShao/FastCkpt/assets/17453999/30d6d83b-f517-4bd2-9b5d-bc3ceab3ad4e) Judging from the test results, the speed becomes slower after using ckpt, and the results have diff. I...