Varuna Jayasiri comments

Results 56 comments of


                                            Varuna Jayasiri

ppo code running error

Looks like it's not compatible with some newer version of opencv.

Question about RoPE code

The ordering is different. So it wont affect training from scratch but you cant load a model trained with different ordering.

Question about RoPE code

It's easier to code

Question about value_pe

Thanks. It's a bug

Question about value_pe

Fixed it here https://github.com/labmlai/annotated_deep_learning_paper_implementations/commit/09d09379c2169eac06662e17cb9969dc6e48e36a

Website Code

all the code is in this repo

[EXAMPLE] In the flash attention example keep the max of all blocks seen in scores_max numerical stability

I think, without it, the current approach can lead to overflow in the accumulator and logsum when a block's max is significantly lower than the previous global max, as it...

mha.py array shapes

Our implementation has sequence first. PyTorch LSTM used that and in our initial implementations we used C B H and just continued with it. B C D is more commonly...

How to Contribute to This Repository

You can create pull requests with new contributions. We've had a few people contributing new paper implementations and improving/fixing existing implementations.

How to Contribute to This Repository

Sorry for the really late reply. Been really busy. Hoping to spend a little time on this project in the next few weeks (mostly cleaning up code and fixing)