Yu Zhang

Results 89 comments of Yu Zhang

@Niccolo-Ajroldi Hi, before autocast, you should cast the model to bfloat16 by `model.bfloat16()` first.

ah, thank u for the reminder, just fix this bug for autocast.

TTT-linear almost done https://github.com/fla-org/flash-linear-attention/pull/151 cc @Pan-Yuqi Need some checks for T

@penghui-yang Hi, trying to reduce the number of chunks might help you. I hear from my friend that this could help improve the stability. Check out my adapted code, which...

@penghui-yang How about doing matmuls under tf32? I think this would reduce the accum errors.

@fegin @tianyu-l I’d like to share a few scenarios where skipping gradient updates might be beneficial. For instance, when working with data from multiple fields or datasets that aren’t perfectly...

@andrewkho Hi, wondering if this arg will affects the final state loading? For example, if `snapshot_every_n_steps=4`, what will happen if I want to `load_state_dict` or save `state_dict` at the 5th...

@andrewkho thank you for the nice response! That is a very clever tradeoff

@andrewkho It would be better if adding explanations of “fast forward” into docs for anyone curious about it