maxtext issues

Pin nvidia-cudnn-cu12==8.9.7.29

cudnn 9 is not uploaded to pypi due to pypi restrictions

It looks like you already support Mistral, though maybe missing sliding window attention. Would be great to: * add a section about it in https://github.com/google/maxtext#supported-open-models * how to do inference...

borisdayma

Cut a release branch for stable GPU runs WIP

chajath

`attend_dtype` not used

1

[Here](https://github.com/google/maxtext/blob/5353a957594bd6cf316747cd5a327c163caca74f/MaxText/layers/embeddings.py#L102-L103) it seems that the hard-coded `bfloat16` is used instead of `attend_dtype`. Also `query` is not cast. I guess the correct behavior should be casting both `query` and `self.embedding` to...

zhixuan-lin

bug

Create a user friendly inference demo

This is a feature request. I like `maxtext` because it is very customizable and efficient for training. The main issue I’m having is hacking away an inference function. The code...

borisdayma

inference

Flip env sensing so we can launch gpu with checkpointing

A better fix would be to fully embrace `ClusterEnv`'s visitor pattern and write a subclass that could work with ENVs that we set from the launch script, and it is...

chajath

[NOT FOR MERGE] Rwitten host offload demo

1

``` [rwitten@t1v-n-621261c1-w-0 2024-03-19 23:26:33] ~/maxtext (rwitten_shmap_collective_matmul_finalized) python3 pedagogical_examples/host_offload.py F0319 23:26:38.216995 1800494 llo_decomposer.cc:893] Unexpected opcode: dma-vmem-to-host-ram *** Check failure stack trace: *** @ 0x7f2ff084ec24 (unknown) @ 0x7f2ff084e744 (unknown) @ 0x7f2ff084ef89 (unknown)...

rwitten

pull ready

Cut a release branch for GPU runs + flash attention WIP

chajath

Compatibility issue with tensorflow>=2.15.1 on GPU

Hi team, I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound...

chajath

maxtext
maxtext copied to clipboard

Metadata

Add llama2-13b tests

Pin nvidia-cudnn-cu12==8.9.7.29

Document use of Mistral

Cut a release branch for stable GPU runs WIP

`attend_dtype` not used

Create a user friendly inference demo

Flip env sensing so we can launch gpu with checkpointing

[NOT FOR MERGE] Rwitten host offload demo

Cut a release branch for GPU runs + flash attention WIP

Compatibility issue with tensorflow>=2.15.1 on GPU

← Metadata

Owner

Metadata

maxtext maxtext copied to clipboard

Metadata

← Metadata

Owner

Metadata

maxtext
maxtext copied to clipboard