Less Wright comments

Results 81 comments of


                                            Less Wright

SLS and parameter groups for larger datasets?

Hi @IssamLaradji - Monday works great. FastAI does not have lbfgs...I've had some discussions with Jeremy about how FastAI v2 can support optimizers like SLS, AliG, etc. that require passing...

[fused_rmsnorm] Register as a custom operator for tracing

> What is "IMA" short for? Illegal Memory Access - the generic cuda error that something has exceeded it's memory index.

[fused_rmsnorm] Register as a custom operator for tracing

> Thanks @lessw2020 . Do you think the IMA relates to the triton kernel? Can you help fix it? PP needs this fix to land. Would appreciate your help. Hi...

[fused_rmsnorm] Register as a custom operator for tracing

Hi @kwen2501 - sure, here's the specific line that has the issue. https://github.com/pytorch/torchtitan/blob/f72a2a0da0bdfc394faaab9b3c0f35d0b6f5be50/torchtitan/models/norms.py#L198 That is loading the inputs and masking off any values past the known col length and should...

Documentation for AnyPrecisionOptimizer

Happy to work on this (actually had started a doc this weekend). Thanks for adding the tracking issue @rohan-varma and for the feedback @stas00!

Documentation for AnyPrecisionOptimizer

@lxuechen - thanks for the reminder here. We have been using AnyPrecision, so let's see about getting it into TorchMM for a home and then can also add documentation. Will...

selective compilation - norm layers only

> Is 2 saying that in order to have "full" compile you need to set both compile=true and compile_rmsnorm = true I updated the text to be more specific, but...

[export] Failed to trace HF Llama2 model

Hit the same issue (627 nightlies) and talked with @kwen2501 about it. This is becoming more urgent. This issue has been open for 3 weeks...would anyone be able to address...

[export] Failed to trace HF Llama2 model

updating - the same error blocks tracing of Llama3-8B, so it's continuing to block on more models. tested with 2.5.0.dev20240630+cu121 ``` [rank0]:[rank0]: Traceback (most recent call last): [rank0]:[rank0]: File "/home/less/local/miniconda3/envs/inference/lib/python3.10/site-packages/torch/distributed/pipelining/_IR.py",...

Improve training UX with TPS, GPU peak mem %, cudaMalloc retries, and do it in color for pizzaz

> @lessw2020 thanks so much for making these changes! I think our logging to console can definitely be better and these metrics generally make a lot of sense to me....