accelerate Transformer Engine memory-efficient initialization to convert

When attempting to convert large models (e.g., Llama-405) to use transformer_engine layers via the convert_model function, I'm encountering out-of-memory (OOM) errors. This seems to happen because the current implementation keeps both original and transformed modules in memory while copying weights.

A mechanism to defer weight initialization until after the convert_model function completes would significantly improve memory efficiency when working with large-scale models.

Sample accelerate.config which OOMs while converting large models

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: "no"
enable_cpu_affinity: false
fsdp_config:
  fsdp_activation_checkpointing: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
machine_rank: 0
main_process_ip: ****
main_process_port: 29603
main_training_function: main
mixed_precision: bf16
num_machines: 5
num_processes: 40
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Jun 24 '25 16:06 mayukh-stackav

Hi, is #3646 something you're looking for? Not entirely familiar with TE.

Jun 25 '25 10:06 S1ro1

ok i think i not understand your query or issue properly i think you talking about something nvidia/transformer_engin i am right ? first how to convert accelerate LLM to transformer_engin i search on google about that but i did not found anything useful for me to understand this , can you help me pleaseeeee > @S1ro1 @mayukh-stackav

Jul 24 '25 13:07 gspeter-max

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Aug 17 '25 15:08 github-actions[bot]

Transformer Engine memory-efficient initialization to convert_model for large models