tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Bug] [Crash][FFI/LLVM] Segfault when importing TVM after PyTorch/Transformers — LLVM static init in `COFFDirectiveParser` (`OptTable::buildPrefixChars`)

Open tinywisdom opened this issue 3 months ago • 3 comments

Summary

In a Python process where PyTorch and Transformers are imported first, importing TVM immediately segfaults before any TVM API is used. The backtrace points to an LLVM static initializer in COFFDirectiveParserllvm::opt::OptTable::buildPrefixChars, triggered during dlopen of TVM/LLVM. This looks like an in-process LLVM conflict / ODR issue (two different LLVMs or incompatible C++ lib/ABI pulled in by prior imports).

I’m not calling any TVM APIs yet; the crash happens right after import tvm completes the FFI bootstrap.

Actual behavior

--------Start load model-----------
!!!!!!! TVM FFI encountered a Segfault !!!!!!! 
  File "./dlfcn/dlopen.c", line 81, in ___dlopen
  ...
  File "<unknown>", in _GLOBAL__sub_I_COFFDirectiveParser.cpp
  File "<unknown>", in COFFOptTable::COFFOptTable()
  File "<unknown>", in llvm::opt::OptTable::buildPrefixChars()
  ...
  File "<unknown>", in tvm::ffi::(anonymous namespace)::backtrace_handler(int)
  File "<unknown>", in tvm::ffi::(anonymous namespace)::Traceback()

Segmentation fault (core dumped)

So the failure happens during dlopen → static initialization inside LLVM, not in TVM Python code I control.

Environment

  • OS: (Ubuntu 22.04.4 LTS (x86_64))
  • TVM version: (release v0.21.0)
  • Python: (3.10.16)
  • LLVM: (17.0.6)

Steps to reproduce

import torch
import torch.nn as nn
from transformers import BertModel

class BaseModelTemplate(nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__()
        for key, value in kwargs.items():
            setattr(self, key, value)

class MyModel(BaseModelTemplate):
    def __init__(self, bert_model, num_labels):
        super().__init__(bert=bert_model, num_labels=num_labels)
        self.dropout = nn.Dropout(0.25)
        self.classifier = nn.Linear(768, num_labels)

    def process_input(self, input_ids, attention_mask, token_type_ids):
        return {
            "input_ids": input_ids,
            "attention_mask": attention_mask,
            "token_type_ids": token_type_ids,
        }

    def forward_pass(self, processed_input):
        outputs = self.bert(**processed_input)
        sequence_output = outputs[0]
        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)
        return logits

    def generate_output(self, logits):
        return logits

    def forward(self, inputs):
        input_ids, attention_mask, token_type_ids = inputs
        processed_input = self.process_input(input_ids, attention_mask, token_type_ids)
        intermediate = self.forward_pass(processed_input)
        return self.generate_output(intermediate)

def my_model_function():
    bert = BertModel.from_pretrained("bert-base-uncased")
    return MyModel(bert, num_labels=2)

def GetInput():
    batch_size = 1
    seq_length = 128
    input_ids = torch.randint(0, 100, (batch_size, seq_length), dtype=torch.long)
    attention_mask = torch.ones((batch_size, seq_length), dtype=torch.long)
    token_type_ids = torch.zeros((batch_size, seq_length), dtype=torch.long)
    return (input_ids, attention_mask, token_type_ids)

def trigger_known_bugs(model=None):
    import numpy as np
    import torch

    # Importing TVM *here* (after torch/transformers) triggers the crash,
    # even before any TVM APIs are called:
    print("--------Start load model-----------")
    if model is None:
        model = my_model_function()
    print("-------------------")
    torch.manual_seed(42)
    np.random.seed(42)
    model.eval()

if __name__ == "__main__":
    import os
    os.environ.setdefault("CUDA_VISIBLE_DEVICES", "6,7")
    trigger_known_bugs()

    # The crash happens on the *import* line below in practice:
    import tvm  # <-- segfault occurs upon TVM FFI/LLVM init

Triage

  • needs-triage
  • bug

tinywisdom avatar Sep 15 '25 07:09 tinywisdom

cc @tqchen @MasterJH5574

tlopex avatar Sep 15 '25 22:09 tlopex

@tinywisdom thanks for reporting this. I just tried the latest main branch on my end and couldn't reproduce this issue. Given v0.21.0 was released in July and there has been a while, would you mind testing on the latest main commit?

MasterJH5574 avatar Sep 16 '25 03:09 MasterJH5574

@MasterJH5574 Thanks for your quick check! It’s great to know the issue doesn’t reproduce on the latest main branch. Unfortunately I don’t currently have an environment built directly from TVM-main to verify it myself. I’ll be happy to test again once a new stable release comes out, and I’ll make sure to report back if I encounter any further problems.

tinywisdom avatar Sep 16 '25 10:09 tinywisdom