Zhihua.Liu

Results 7 issues of Zhihua.Liu

I got 'value accu=40.0' and found that the model uses 'bert-base-uncased' as the encoder by default. Could the reason be that I were not using a Chinese bert for math23k?...

Thanks for sharing this excellent great work. We want to use pytorch models to try the effect of ring attention. Are there any plans to develop ring attention implementation under...

我发现数据预处理的代码中,训练集和验证集的划分方式是用了一条random.random()

I noticed that the evaluation method "RUBER" only provides code for testing/training. Could you please share the weight of Ruber's model or the data used for training? I would be...

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

![image](https://github.com/zhuzilin/ring-flash-attention/assets/17453999/57e20774-f48c-47a6-8208-b97a23928b17) Thanks for sharing this excellent implementation of ring attention. Here are my test results on 2*A100 (with nvlink). Judging from the results, the memory usage of ring attention(ring_flash_attn_qkvpacked_func) seems...

After running the provided test script I got the following results: ![image](https://github.com/RulinShao/FastCkpt/assets/17453999/30d6d83b-f517-4bd2-9b5d-bc3ceab3ad4e) Judging from the test results, the speed becomes slower after using ckpt, and the results have diff. I...