CodeLLDB crashes with SIGSEGV when debugging code that contains enum variant sharing name with a struct contained within that variant
OS: Ubuntu 24.04 VSCode version: 1.106.2 CodeLLDB version: 1.11.8 Compiler: rustc 1.91.1 (ed61e7d7e 2025-11-07) Debuggee: rust application
Description: CodeLLDB crashes with SIGSEGV when stepping through code that loads an EmbeddingModel using the tokenizers crate. The tokenizer loads a ~9MB JSON file (tokenizer.json). The crash occurs when LLDB attempts to render variables in the Variables pane.
Steps to Reproduce:
- Create a Rust project using the tokenizers crate (or any crate that loads large JSON into structs)
- Load a tokenizer with a large vocabulary (~9MB JSON)
- Set a breakpoint after the model/tokenizer is loaded
- Start debugging with CodeLLDB
- Debugger crashes with SIGSEGV
Error Message: EOF while parsing a value Followed by LLDB crash (SIGSEGV).
Expected Behavior: Debugger should either:
- Display the variable with truncated/summarized content
- Skip rendering overly large structures
- Show a placeholder like
Actual Behavior: CodeLLDB crashes when the Variables pane attempts to render the tokenizer struct.
Workaround: Collapse the Variables pane before hitting the breakpoint. The debugger works fine if it doesn't attempt to render the large struct. Individual variables can still be inspected via hover or Watch pane.
Log after hitting breakpoint:
[DEBUG codelldb::debug_session::breakpoints] Callback for breakpoint location 1.1: where = subsync_core-d4246f0106cd808e`subsync_core::embedding::tests::test_czech_english_semantic_matching + 7 at embedding.rs:321:13, address = 0x0000555555cb2127, resolved, hit count = 1
[DEBUG codelldb::debug_session] Debug event: 0x7730dc001f10 Event: broadcaster = 0x5aa9030662d8 (lldb.process), type = 0x00000001 (state-changed), data = { process = 0x5aa9030662a0 (pid = 410729), state = stopped}
[DEBUG codelldb::dap_codec] <-- {"seq":28,"type":"event","event":"stopped","body":{"allThreadsStopped":true,"hitBreakpointIds":[1],"reason":"breakpoint","threadId":410731}}
[DEBUG codelldb::dap_codec] --> {"command":"threads","type":"request","seq":12}
[DEBUG codelldb::dap_codec] <-- {"seq":29,"type":"response","request_seq":12,"success":true,"command":"threads","body":{"threads":[{"id":410729,"name":"1: tid=410729 \"subsync_core-d4\""},{"id":410731,"name":"2: tid=410731 \"embedding::test\""}]}}
[DEBUG codelldb::dap_codec] --> {"command":"stackTrace","arguments":{"threadId":410731,"startFrame":0,"levels":1},"type":"request","seq":13}
[DEBUG codelldb::dap_codec] <-- {"seq":30,"type":"response","request_seq":13,"success":true,"command":"stackTrace","body":{"stackFrames":[{"column":13,"id":1001,"instructionPointerReference":"0x555555CB2127","line":321,"moduleId":"555555554000","name":"subsync_core::embedding::tests::test_czech_english_semantic_matching","source":{"name":"embedding.rs","path":"/home/daniel/projects/subsync/subsync-core/src/embedding.rs"}}]}}
[DEBUG codelldb::dap_codec] --> {"command":"scopes","arguments":{"frameId":1001},"type":"request","seq":14}
[DEBUG codelldb::dap_codec] <-- {"seq":31,"type":"response","request_seq":14,"success":true,"command":"scopes","body":{"scopes":[{"expensive":false,"name":"Local","variablesReference":1002},{"expensive":false,"name":"Static","variablesReference":1003},{"expensive":false,"name":"Global","variablesReference":1004},{"expensive":false,"name":"Registers","variablesReference":1005}]}}
[DEBUG codelldb::dap_codec] --> {"command":"variables","arguments":{"variablesReference":1002},"type":"request","seq":15}
Debug adapter exit code=null, signal=SIGSEGV.
Can you provide code example that reproduces this?
Hello, herer is code:
use tokenizers::Tokenizer;
fn main() {
println!("Loading tokenizer...");
// Load tokenizer from file (~9MB JSON structure)
// You need to provide a tokenizer.json file - download one from HuggingFace:
// https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2/blob/main/tokenizer.json
let tokenizer = Tokenizer::from_file("tokenizer.json")
.expect("Failed to load tokenizer - download tokenizer.json from HuggingFace");
// Set breakpoint here - debugger crashes when Variables pane tries to render `tokenizer`
println!("Tokenizer loaded successfully!");
// Use the tokenizer so it's not optimized away
let encoding = tokenizer.encode("Hello world", false).unwrap();
println!("Encoded {} tokens", encoding.get_ids().len());
}
crate dependency for cargo.toml
[dependencies]
tokenizers = "0.22.1"
Put breakpoint on println line with opened Variables pane, once it stops on BP, CodeLLDB crashes. I have also provided url for tokenizer.json.
If anything else, let me know. Thank you
Actually doesn't seem to be related to the file size. This code is sufficient to repro:
let tokenizer = Tokenizer::new(
WordLevel::builder().vocab(ahash::AHashMap::new()).build().unwrap(),
);
The crash happens due to a stack overflow in LLDB:
clang::Redeclarable<clang::TagDecl>::getNextRedeclaration() const (/home/vadimcn/llvm-project/clang/include/clang/AST/Redeclarable.h:186)
clang::Redeclarable<clang::TagDecl>::getMostRecentDecl() (/home/vadimcn/llvm-project/clang/include/clang/AST/Redeclarable.h:224)
clang::RecordDecl::getMostRecentDecl() (/home/vadimcn/llvm-project/clang/include/clang/AST/Decl.h:4286)
clang::RecordDecl::getMostRecentDecl() const (/home/vadimcn/llvm-project/clang/include/clang/AST/Decl.h:4289)
clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3378)
(anonymous namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes() (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:216)
(anonymous namespace)::EmptySubobjectMap::EmptySubobjectMap(clang::ASTContext const&, clang::CXXRecordDecl const*) (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:171)
clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3417)
(anonymous namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes() (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:216)
(anonymous namespace)::EmptySubobjectMap::EmptySubobjectMap(clang::ASTContext const&, clang::CXXRecordDecl const*) (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:171)
clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3417)
(anonymous namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes() (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:216)
(anonymous namespace)::EmptySubobjectMap::EmptySubobjectMap(clang::ASTContext const&, clang::CXXRecordDecl const*) (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:171)
clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3417)
(anonymous namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes() (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:216)
(anonymous namespace)::EmptySubobjectMap::EmptySubobjectMap(clang::ASTContext const&, clang::CXXRecordDecl const*) (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:171)
clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3417)
(anonymous namespace)::EmptySubobjectMap::ComputeEmptySubobjectSizes() (/home/vadimcn/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:216)
...
The type in question was StripAccents, which is a ZST.
After some experimentation, I was able to come up with this repro:
struct Empty;
enum Foo {
Empty(Empty),
}
fn main() {
let foo = Foo::Empty(Empty);
println!("Boo!");
}
The critical bit is that the enum variant and the ZST struct have the same name.
Might be another manifestation of https://github.com/llvm/llvm-project/issues/43604
Edit: ZST seems to be irrelevant as well - just a struct of the same name as the enum variant is sufficient.
Alright, so nothing that can be easily fixed. Thank you for your investigation.