Yunsheng Ni

Results 8 comments of Yunsheng Ni

Same question.

I have implemented all the parts and experiments, the strange thing is that the loss of this xray is infinity, I do n’t know what the problem is

我这个代码好久以前的了,我没有复现出来,我可能还得整理一下 > On Dec 27, 2020, at 8:58 PM, ZuoyanL wrote: > >  > I have implemented all the parts and experiments, the strange thing is that the loss...

Why do you think that `FMHA_ENABLE` stands for FlashAttention?

I don't think `FMHA_ENABLE` stands for the FlashAttention, it stands for `fused multi-head attention` . ![image](https://github.com/NVIDIA/FasterTransformer/assets/62385270/e37e0766-9f89-42f4-a827-2a56951701f5) You can see [gpt_guide.md](https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md) for more information.

退火阶段的学习率具体是线性,还是指数呢?如果是指数的话,T和N的关系是什么呢? ![](https://shengdinghu.notion.site/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F30c36155-a603-469f-957f-b0854b6e2372%2Ff70abacd-85ca-423c-b107-875cf6c96707%2FUntitled.png?table=block&id=ddb7a6fd-594e-4110-9e9a-491586d113f4&spaceId=30c36155-a603-469f-957f-b0854b6e2372&width=770&userId=&cache=v2)

> I published some at https://huggingface.co/datasets/malaysia-ai/Flash-Attention3-wheel, > > ## Flash-Attention3-wheel > Flash Attention 3 wheels on commit [0e60e39473e8df549a20fb5353760f7a65b30e2d](https://github.com/Dao-AILab/flash-attention/commit/0e60e39473e8df549a20fb5353760f7a65b30e2d). > > ### Build using H100 > For PyTorch 2.6.0 12.6, 2.7.0...