flux [QUESTION] Is pretraining possible in Megatron using this method?

I've tried several approaches, but due to compatibility issues between the Transformer Engine (TE) version and the PyTorch version, I had difficulty getting Flux to work properly.

May 20 '25 10:05 yeontaek

You have to use TE and TE requires a torch version. That torch version is not compatible with FLUX, right?

You can compile FLUX from source with the torch version you want.

May 29 '25 03:05 houqi

@houqi Thank you for your response. I'm currently trying to run pretraining using the repository below. Would it be possible for you to share a Dockerfile that works with Megatron-LM? https://github.com/ZSL98/Megatron-LM/

Jul 10 '25 06:07 yeontaek

Additionally, based on what I’ve found, Flux works under the following conditions: torch (2.4.0, 2.5.0, 2.6.0), python (3.10, 3.11), and cuda (12.4). I’m currently building a Dockerfile and attempting pretraining using the nvcr.io/nvidia/pytorch:24.05-py3 image, which meets these requirements. Do you happen to know if there is a version of Transformer Engine (TE) that is compatible with these versions?

Jul 10 '25 06:07 yeontaek

Additionally, based on what I’ve found, Flux works under the following conditions: torch (2.4.0, 2.5.0, 2.6.0), python (3.10, 3.11), and cuda (12.4). I’m currently building a Dockerfile and attempting pretraining using the nvcr.io/nvidia/pytorch:24.05-py3 image, which meets these requirements. Do you happen to know if there is a version of Transformer Engine (TE) that is compatible with these versions?

sorry that I'm not so familar with TE. you have to find it out yourself.

Jul 24 '25 08:07 houqi

@houqi Thank you for your response. I'm currently trying to run pretraining using the repository below. Would it be possible for you to share a Dockerfile that works with Megatron-LM? https://github.com/ZSL98/Megatron-LM/

I will try to make sure that some nvcr.io pytorch versions are supported.

Jul 24 '25 08:07 houqi