Ryan Tremblay issues

Results 40 issues of


                                            Ryan Tremblay

Casting Existing FP32/FP16 model weights to BF16 + Kahan Summation

The instructions say we should just cast the model weights to BF16, but wouldn't that chop a bunch of useful information when resuming from an existing checkpoint (e.g., for continued...

[REQUEST] Asynchronous Checkpointing

**Is your feature request related to a problem? Please describe.** Checkpointing is significantly faster with Torch Distributed's async checkpoint feature: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict_saver.async_save Blog post: https://pytorch.org/blog/reducing-checkpointing-times/ We want to checkpoint frequently, but...

enhancement

[BUG] Training with RoPE is broken: Can't stop FP32 layers from being cast to FP16/BF16 during training

**Describe the bug** Many modern transformer components (e.g., RoPE, certain Layer Norm setups) need to be stored and run in FP32. Most of the time, we can accomplish this by...

bug

training

Skypilot doesn't actually wait on capacity blocks as the docs says it does

https://skypilot.readthedocs.io/en/latest/reservations/reservations.html Docs say > If you have a capacity block with a starting time in the future, you can run sky jobs launch --region us-east-1 --gpus H100:8 task.yaml to let...

[BUG]: Missing implementation of MS-TC-CWT?

[This paper ](https://www.yichenggu.com/TFR-Discriminators/) mentions its implementation is in this repo, but I can't find an implementation of the aforementioned CWT discriminator. Is it here? Thanks!

bug

Which Korean Paper/Audio Model?

Which model are you referring to in this commit? https://github.com/lucidrains/soundstorm-pytorch/commit/797afa0828f3bc690c5f43ccccb8008d7f04337c Thanks

Smallest acceptable quality vocal model?

Hi, can you share which models here are the smallest (both in compute and parameter count) with acceptible quality for vocals only? Thanks!

Implementing for flow models?

I tried to implement this for flow models as described in the appendix, but the results are complete collapse (exploding images). Did I make a mistake or is this technique...

FSQ Oddness

VQPytorch's FSQ with symmetry on and noise dropping set to 0.5 seems to perform significantly better than the [reference implementation](https://github.com/Stability-AI/stable-codec/blob/main/stable_codec/fsq.py) in recon loss with the same settings, so I set...

Incorrect Results with FlexDecoding

@BoyuanFeng ## Summary The `KVCache.update()` method returns the entire cache buffer including uninitialized (zero) positions, which causes significant numerical errors when using flex_attention. While this doesn't visibly affect discrete token...