Wing Lian comments

Results 103 comments of


                                            Wing Lian

Efficiently compute total number of steps

I think this might not be necessary as checking the length of the sampler is already what happens under the hood. Also, inspecting the torch `DataLoader` class, the `__len__()` method...

CUDA device error with llama2_chat strategy

One thing you can try is the branch in PR https://github.com/OpenAccess-AI-Collective/axolotl/pull/578 Simply set : ```yaml type: sharegpt conversation: llama-2 ```

CUDA device error with llama2_chat strategy

> I see similar behaviour @kaldeberger saw, I was getting loss trend down from 0.9 to 0.2 after an epoch on my dataset, however switching to new prompt strategy I...

Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`

> > This is also relevant: https://github.com/mgmalek/efficient_cross_entropy/ > > i just tested this with llama3 8B by monkey patching huggingface transformers' model class > and this is for `[B, T]...

Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`

> This kernel doesn't support `ignore_index`, currently. You can tweak [this line] this line isn't sufficient to support `ignore_index`? https://github.com/mgmalek/efficient_cross_entropy/blob/049d44460051a82f58f7ff49a2ad0653ecf026d8/modules.py#L56

AdaLomo optimizer step method

Thanks @KaiLv69 . Would you be able to share the LRs you used for the adalomo and adamw experiments in the paper? I think I might have the LR off...

schedulefree optimizers

@pacman100 @muellerzr @younesbelkada Can we get a new review to get this merged? Since the last check, I rebased, added some fixes and docs.

schedulefree optimizers

@muellerzr ran the `make quality`/lint and also added a smoke test to the test suite for schedule free adam

schedulefree optimizers

Will get back to this soon. Not stale 😅

schedulefree optimizers

thanks for the fixes @tmm1 !