Tom Fogal
Tom Fogal
98% sure Masaki's recent PR implemented this. Assigning to Masaki to either close or comment on status.
> cudnn executor is overly optimistic when claiming sdpa. ahh; yes, we should fix that first. Thanks for remembering this! > sdpa checker function can throw errors with a pseudo...
Quick update: I talked to the NeMo team and Eric had the (reasonable) concern that swapping in-place for out-of-place might increase memory consumption. The onus is on me to give...
would like to discuss at triage review: can we just say `cuda` for everything? what about tensors from the outside (i.e. input tensors) that are on `cpu` or even things...
triage team: looking to understand if this is high-effort or low-effort (honestly, looking for that on all NeMo things, this one's just particularly out of my depth).
> model may need to be revised to target thunder I'm not sure "revise the model" is going to be a reasonable solution in the general case; already seen this...
triage team: what guarantees do we need to provide w.r.t. the generated random numbers? i.e. do we need to match torch exactly, match the same distribution, merely respect the `low`/`high`...
Closing based on discussion above.
FWIW it looks like [NeMo always sets `beta` to `0.0`](https://github.com/NVIDIA/NeMo/blob/8e65042d15062ce3fbe639f9d428c639510d894c/nemo/collections/nlp/modules/common/megatron/attention.py#L947).
> Can you share the script for the `examine` call? @athitten when you have a minute