paxml
paxml copied to clipboard
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading...
[Question] Very low MFU(30%~35%) when train bf16 Llama2 and GPT model with single SXM4 A100 machine.
I don't know what happened, is the calculation precision and parameter precision not set correctly? Deepspeed or Megatron could achieve 55% MFU easily with same machine. Here is my bash...
For example in single A100 machine. Llama2 13B training speed with TP2 DP 4 + Zero1 is more faster than FSDP.
Will the coming soon tutorials like sharding, etc be released soon? https://github.com/google/paxml/blob/main/paxml/docs/hands-on-tutorials.md#sharding-in-pax
I am trying to continue training my model from a checkpoint, using paxml. Does paxml not support `restore_checkpoint_dir` or `restore_checkpoint_step` for train mode?
Hi, is there a way to save a quantized int8 checkpoint? Looks like right now the checkpoint is in fp32.
Refactoring to allow gradient clipping to be performed on full batch rather than subbatches when using `ShardedStaticAccumulator`. Note that this refactor allows us to maintain support for `enable_skip_step_on_gradient_anomalies` and requires...
Bumps [transformers](https://github.com/huggingface/transformers) from 4.29.2 to 4.30.0. Release notes Sourced from transformers's releases. v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone 100k Transformers has just reached 100k stars...
Bumps [transformers](https://github.com/huggingface/transformers) from 4.27.4 to 4.30.0. Release notes Sourced from transformers's releases. v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone 100k Transformers has just reached 100k stars...
I've been trying to install PAXML on Ubuntu 22.04 ARM64 but I seem to stuck in getting lingvo (mandatory dependency?) running there: I've been struggling to find a recipe for...