lightning-thunder [NeVa] [rank0]: TypeError: matmul(): argument 'input' (position 1) must be Tensor, not TensorProxy

While preparing the benchmark for eager and dynamo using the code from the fork: https://github.com/tfogal/NeMo I get errors for dynamo case.

🐛 Bug

After fixing 1187 NeMo NeVa dynamo throws a new error

model.model = torch.compile(backend=thunder_backend, dynamic=False)(model.model)

it throws:

[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/graph_module.py", line 359, in __call__
[rank0]:     raise e.with_traceback(None)  # noqa: B904
[rank0]: TypeError: matmul(): argument 'input' (position 1) must be Tensor, not TensorProxy

To Reproduce

Steps to reproduce the behavior:

Clone: https://github.com/tfogal/NeMo
Use latest lightning-thunder version container
Install additionally:

python3 -m pip install --no-deps huggingface-hub==0.23.2
python3 -m pip install --no-deps transformers==4.40.2
python3 -m pip install -e .
python3 -m pip install git+https://github.com/NVIDIA/Megatron-LM.git@6dd3a1afa4e26d4d27e58d1e83aaa6ee6e36b477

Execute:

rm -f /tmp/graph*.log.txt
export  HYDRA_FULL_ERROR=1
export  THUNDER_ANNOTATE_TRACES=1
export  NEMO_THUNDER_NEVA=dynamo
python3 \
    ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py \
      trainer.precision=bf16-mixed \
      model.megatron_amp_O2=True \
      model.mcore_gpt=False \
      trainer.num_nodes=1 \
      trainer.devices=1 \
      trainer.val_check_interval=10 \
      trainer.limit_val_batches=5 \
      trainer.log_every_n_steps=1 \
      ++exp_manager.max_time_per_run=00:00:03:00 \
      trainer.max_steps=20 \
      model.micro_batch_size=2 \
      model.global_batch_size=4 \
      model.tensor_model_parallel_size=1 \
      model.pipeline_model_parallel_size=1 \
      exp_manager.create_checkpoint_callback=False \
      model.data.data_path=./data/multimodal/tiny-neva/dummy.json \
      model.data.image_folder=./data/multimodal/tiny-neva/images \
      model.tokenizer.library=sentencepiece \
      model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model \
      model.num_layers=2 \
      model.hidden_size=5120 \
      model.ffn_hidden_size=13824 \
      model.num_attention_heads=40 \
      model.normalization=rmsnorm \
      model.data.num_workers=0 \
      model.data.conv_template=llama_2 \
      model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 \
      model.mm_cfg.llm.from_pretrained=null \
      model.use_flash_attention=false \
      exp_manager.exp_dir=./nemo_neva

Expected behavior

The pretraining should run smoothly.

Environment

As in the container

Additional context

Attaching the full log of the error: dynamo_error_2nd_Oct.txt

Oct 02 '24 13:10 wprazuch

this is from torch.ops.higher_order.autograd_function_apply wanting #1134 (As mentioned elsewhere, I think a more timely way to fix would be to follow the torch.autograd.Function lookaside pattern and acquire the fw and bw _interpret_call at tracing time.)

Oct 02 '24 13:10 t-vi

I noticed that if we change the test case here: https://github.com/Lightning-AI/lightning-thunder/blob/d6455982d5ca6815efd3d7dc0341b4f945f99be5/thunder/tests/test_jit_general.py#L1206-L1217 to use torch.sin(x), it gives the similar error, so it could be the repro: "TypeError: sin(): argument 'input' (position 1) must be Tensor, not TensorProxy"

Oct 02 '24 13:10 kiya00