paxml issues

Results 24 paxml issues

Sort by recently updated

update grok model param

Corrected Grok model parameters to match OSS Grok model

Add Transformer Engine support to Paxml

Adds support for NVIDIA's [Transformer Engine](https://github.com/NVIDIA/TransformerEngine). TE can be enabled by setting the environment variable `ENABLE_TE=1`. For more details about running Pax with Transformer Engine, refer to the [JAX Toolbox...

ashors1

Bump tensorflow from 2.9.3 to 2.11.1 in /paxml/pip_package in the pip group across 1 directory

Bumps the pip group with 1 update in the /paxml/pip_package directory: [tensorflow](https://github.com/tensorflow/tensorflow). Updates `tensorflow` from 2.9.3 to 2.11.1 Release notes Sourced from tensorflow's releases. TensorFlow 2.11.1 Release 2.11.1 Note: TensorFlow...

dependabot[bot]

dependencies

DEADLINE_EXCEEDED on 1024 GPUs.

> > Additional GRPC error information from remote target unknown_target_for_coordination_leader while calling /tensorflow.CoordinationService/RegisterTask: > :{"created":"@1712965181.656280441","description":"Deadline Exceeded","file":"external/com_github_grpc_grpc/src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":4} > 2024-04-12 23:39:41.656900: E external/xla/xla/pjrt/distributed/client.cc:96] Coordination service agent in error status: DEADLINE_EXCEEDED: Deadline Exceeded...

mhugues

Set offload checkpoint policy

NOT FOR COMMIT Depends on https://github.com/google/praxis/pull/51

jaro-sevcik

[NVIDIA] Add config option to use cudnn flash attention

This PR is to allow users to enable the cudnn flash attention. The PR depends on https://github.com/google/praxis/pull/53. The preliminary results for the GPT3-5B, we can observe ~30% perf improve on...

kaixih

Jax + tpu and AQT int8 train model loss is abnormal

I used the aqt_einsum function in the code to only quantify the qk sccore, and then trained the model. However, I found that the loss dropped very slowly after training...

Lisennlp

Update README.md: fix missing -r argument to pip

Pip complains without -r before passing in a requirements file.

joker-eph

pull ready

support for fractional per core batch size

@zhangqiaorjc please review this change to support per core batch size < 1 with the synthetic dataset.

abhinavgoel95

Use bfloat16 for eval

I'm running paxml on an Intel Xeon CPU server using the paxml/main.py program. I'm trying to create a model that creates weights in bfloat16, and uses that datatype during eval....

tbaker2

paxml
paxml copied to clipboard

Metadata

update grok model param

Add Transformer Engine support to Paxml

Bump tensorflow from 2.9.3 to 2.11.1 in /paxml/pip_package in the pip group across 1 directory

DEADLINE_EXCEEDED on 1024 GPUs.

Set offload checkpoint policy

[NVIDIA] Add config option to use cudnn flash attention

Jax + tpu and AQT int8 train model loss is abnormal

Update README.md: fix missing -r argument to pip

support for fractional per core batch size

Use bfloat16 for eval

← Metadata

Owner

Metadata

paxml paxml copied to clipboard

Metadata

← Metadata

Owner

Metadata

paxml
paxml copied to clipboard