Susan Zhang

Results 45 issues of Susan Zhang

* Added GroupNormFp32 back from fairseq * Rename Fp32LayerNorm -> LayerNormFp32 * Added quant_noise back from fairseq * Added SequenceGeneratorWithAlignment from fairseq

cla signed

After https://github.com/facebookresearch/metaseq/pull/459 and https://github.com/facebookresearch/metaseq/pull/556, we can now release updated checkpoints that are consolidated from FSDP shards with different model parallelism as well. We should update all of our checkpoints as...

enhancement

See https://arxiv.org/abs/2201.07520

enhancement

This is to look into whether or not we can remove our Megatron dependency and rely entirely on our Fairscale dependency (model parallelism implementation seems to be identical between the...

good first issue
better-eng

See https://github.com/lucidrains/rotary-embedding-torch/blob/main/rotary_embedding_torch/rotary_embedding_torch.py And from PaLM paper: > We use RoPE embeddings (Su et al., 2021) rather than absolute or relative position embeddings, since RoPE embeddings have been shown to have...

enhancement
good first issue

This is to look into whether or not we still need to use apex for speedups if out-of-the-box PyTorch 2.0 may "just work". Will require benchmarking at a few different...

good first issue
better-eng

Right now, we have the option to use sequence parallel via the `--sequence-parallel` flag: https://github.com/facebookresearch/metaseq/blob/a6ef598cc7b4dac394ba2eab5d0e75ca27a9e8c0/metaseq/modules/transformer_decoder_layer.py#L210-L223 Now that https://github.com/facebookresearch/metaseq/issues/578 is completed, we should add a test here to check rough equivalence...

test-coverage
better-eng

We currently start training from scratch if restore-file is not found. This is not ideal since passing a restore file indicates intention to resume from previous checkpoint and requires additional...

enhancement

Right now, we have: * `streaming_language_modeling` (which we use mainly for pre-training - requires data to be streamed in as text / tokenized-on-the-fly as opposed to being tokenized ahead of...

cleanup
better-eng