maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

A simple, performant and scalable Jax LLM!

Results 159 maxtext issues
Sort by recently updated
recently updated
newest added

cudnn 9 is not uploaded to pypi due to pypi restrictions

It looks like you already support Mistral, though maybe missing sliding window attention. Would be great to: * add a section about it in https://github.com/google/maxtext#supported-open-models * how to do inference...

[Here](https://github.com/google/maxtext/blob/5353a957594bd6cf316747cd5a327c163caca74f/MaxText/layers/embeddings.py#L102-L103) it seems that the hard-coded `bfloat16` is used instead of `attend_dtype`. Also `query` is not cast. I guess the correct behavior should be casting both `query` and `self.embedding` to...

bug

This is a feature request. I like `maxtext` because it is very customizable and efficient for training. The main issue I’m having is hacking away an inference function. The code...

inference

A better fix would be to fully embrace `ClusterEnv`'s visitor pattern and write a subclass that could work with ENVs that we set from the launch script, and it is...

``` [rwitten@t1v-n-621261c1-w-0 2024-03-19 23:26:33] ~/maxtext (rwitten_shmap_collective_matmul_finalized) python3 pedagogical_examples/host_offload.py F0319 23:26:38.216995 1800494 llo_decomposer.cc:893] Unexpected opcode: dma-vmem-to-host-ram *** Check failure stack trace: *** @ 0x7f2ff084ec24 (unknown) @ 0x7f2ff084e744 (unknown) @ 0x7f2ff084ef89 (unknown)...

pull ready

Hi team, I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound...