Ryan Tremblay

Results 40 issues of Ryan Tremblay

The instructions say we should just cast the model weights to BF16, but wouldn't that chop a bunch of useful information when resuming from an existing checkpoint (e.g., for continued...

**Is your feature request related to a problem? Please describe.** Checkpointing is significantly faster with Torch Distributed's async checkpoint feature: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict_saver.async_save Blog post: https://pytorch.org/blog/reducing-checkpointing-times/ We want to checkpoint frequently, but...

enhancement

**Describe the bug** Many modern transformer components (e.g., RoPE, certain Layer Norm setups) need to be stored and run in FP32. Most of the time, we can accomplish this by...

bug
training

https://skypilot.readthedocs.io/en/latest/reservations/reservations.html Docs say > If you have a capacity block with a starting time in the future, you can run sky jobs launch --region us-east-1 --gpus H100:8 task.yaml to let...

[This paper ](https://www.yichenggu.com/TFR-Discriminators/) mentions its implementation is in this repo, but I can't find an implementation of the aforementioned CWT discriminator. Is it here? Thanks!

bug

Which model are you referring to in this commit? https://github.com/lucidrains/soundstorm-pytorch/commit/797afa0828f3bc690c5f43ccccb8008d7f04337c Thanks

Hi, can you share which models here are the smallest (both in compute and parameter count) with acceptible quality for vocals only? Thanks!

I tried to implement this for flow models as described in the appendix, but the results are complete collapse (exploding images). Did I make a mistake or is this technique...

VQPytorch's FSQ with symmetry on and noise dropping set to 0.5 seems to perform significantly better than the [reference implementation](https://github.com/Stability-AI/stable-codec/blob/main/stable_codec/fsq.py) in recon loss with the same settings, so I set...

@BoyuanFeng ## Summary The `KVCache.update()` method returns the entire cache buffer including uninitialized (zero) positions, which causes significant numerical errors when using flex_attention. While this doesn't visibly affect discrete token...