OhadRubin
OhadRubin
It seems that this triton implementation supports attention bias, so there is nothing that prevents the algorithm from supporting it. https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_triton.py Additionally, this implementation of block-parallel attention (faster than FA-1)...
I also had this mysterious crash, no errors. adding additional partition rules to see if it helps. examining tpu_driver.ERROR.txt gives "You probably want to enrich the sharding annotations to prevent...
Any update?
I actually found this at some point too, this should really be added to the documentation!
Hey, If I understand the example they posted [here](https://gist.github.com/crazyoscarchang/c9a11b67c420202da1f26e0d20786750) (and i'm not sure I do): ``` def hyperfanoutWi_init(i): def hyperfanout_init(Wi): fan_out, fan_in = Wi.size(0), Wi.size(1) bound = math.sqrt(3*2 / (fan_in...
I think for LN it won't work because it is multiplicative factor.
https://colab.research.google.com/drive/1-YCU9ps4gNuROJ3_8MLjSpbICGHaySxh?usp=sharing
Not yet
It's been a while since I've worked on this code and this error doesn't seem familiar.... You could think something in `transformers>=4.1,
> Hi @OhadRubin , > > I'm adding the following assertion below line [682](https://github.com/OhadRubin/SmBop/blob/e7a6fce7af5aa5545bd3cfca6c4c4dbef610cd6b/smbop/models/smbop.py#L682): > > ``` > check_is_levelorder_list = is_levelorder_list.sum(-1) > check_is_levelorder_list = (check_is_levelorder_list > 0).float() > check_is_levelorder_list =...