Cao E

Results 59 comments of Cao E

> Instead of creating a new flash attention function, do you think it is possible to reorganize the code to share the main structure of flash attention while invoking either...

This error is expected on the CPU. The CPU autocast handles layernorm as fallthrough. If it is mixed data types, then the weight should be fp32 and the input should...

> Looks like tests are failing? The failures seem to be unstable, and there is no failure after retesting.