OhadRubin comments

Results 36 comments of


                                            OhadRubin

Feature request: FlashAttention

It seems that this triton implementation supports attention bias, so there is nothing that prevents the algorithm from supporting it. https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_triton.py Additionally, this implementation of block-parallel attention (faster than FA-1)...

Mysterious crash on TPU pod

I also had this mysterious crash, no errors. adding additional partition rules to see if it helps. examining tpu_driver.ERROR.txt gives "You probably want to enrich the sharding annotations to prevent...

Implement Maximal Update Parametrization (muP)

Any update?

Feature request: Read gin file from google cloud bucket path

I actually found this at some point too, this should really be added to the documentation!

Adding "Principled Weight Initialization for Hypernetworks"

Hey, If I understand the example they posted [here](https://gist.github.com/crazyoscarchang/c9a11b67c420202da1f26e0d20786750) (and i'm not sure I do): ``` def hyperfanoutWi_init(i): def hyperfanout_init(Wi): fan_out, fan_in = Wi.size(0), Wi.size(1) bound = math.sqrt(3*2 / (fan_in...

Adding "Principled Weight Initialization for Hypernetworks"

I think for LN it won't work because it is multiplicative factor.

Performance benchmarks?

https://colab.research.google.com/drive/1-YCU9ps4gNuROJ3_8MLjSpbICGHaySxh?usp=sharing

Performance benchmarks?

Not yet

lmdb.Error: cache/exp1000train: No such file or directory

It's been a while since I've worked on this code and this error doesn't seem familiar.... You could think something in `transformers>=4.1,

[Possible Bug?] Should is_level_order_list always have atleast one element as 1?

> Hi @OhadRubin , > > I'm adding the following assertion below line [682](https://github.com/OhadRubin/SmBop/blob/e7a6fce7af5aa5545bd3cfca6c4c4dbef610cd6b/smbop/models/smbop.py#L682): > > ``` > check_is_levelorder_list = is_levelorder_list.sum(-1) > check_is_levelorder_list = (check_is_levelorder_list > 0).float() > check_is_levelorder_list =...