llvmlite icon indicating copy to clipboard operation
llvmlite copied to clipboard

str(module) generates invalid LLVM IR with empty PHI instructions in complex loop cases

Open moe-charm opened this issue 4 months ago • 2 comments

I had an AI find it and write a report.

Summary

str(module) generates invalid LLVM IR when multiple PHI instructions are present in loop structures. The PHI instructions are printed without their incoming value pairs (e.g., %phi_17 = phi i64 instead of %phi_17 = phi i64 [%val1, %bb1], [%val2, %bb2]).

Environment

  • llvmlite version: 0.43.0 (also reproduced on 0.44.0)
  • Python version: 3.12.3
  • LLVM triple: x86_64-unknown-linux-gnu
  • OS: Ubuntu 24.04.2 LTS (WSL2)

Expected Behavior

All PHI instructions should include their incoming value pairs when printed via str(module).

Actual Behavior

Some PHI instructions are printed without incoming pairs:

bb7:
  %"phi_17" = phi  i64          ; ← Missing incoming pairs
  %"phi_18" = phi  i64          ; ← Missing incoming pairs
  %"add_19" = add i64 %"phi_18", %"phi_17"
  br label %"bb9"

Minimal Reproduction

A minimal test case with 2 PHIs in a simple if-else structure works correctly:
# See attached: repro_phi_str_print_min.py
# Output: PHIs correctly include [%val1, %bb1], [%val2, %bb2]

However, the issue appears in more complex cases with loops and multiple PHI nodes.

Reproduction Steps

1. Create complex loop structure with multiple PHI nodes
2. Call str(module) to generate IR string
3. Observe that some PHI instructions lack incoming value pairs

Attached Files

- Full IR showing the issue (line 36, 37, 42, 45, 46):
  - See nyash_harness.ll (attached)
- Minimal reproduction script (works correctly):
  - See repro_phi_str_print_min.py (attached)
- Environment details:
  - See env.txt (attached)

Impact

The generated IR is invalid and cannot be verified or used by LLVM tools.

Workaround

We've implemented a post-processing step to detect and handle empty PHI lines, but this is not a proper solution.

[make_bundle.sh](https://github.com/user-attachments/files/23111829/make_bundle.sh)
[README.md](https://github.com/user-attachments/files/23111830/README.md)
[repro_from_hako_builder.sh](https://github.com/user-attachments/files/23111828/repro_from_hako_builder.sh)
[repro_phi_str_print_min.py](https://github.com/user-attachments/files/23111832/repro_phi_str_print_min.py)
[gather_env.sh](https://github.com/user-attachments/files/23111831/gather_env.sh)

moe-charm avatar Oct 24 '25 03:10 moe-charm

@moe-charm Have you been able to reproduce this yourself?

gmarkall avatar Oct 24 '25 13:10 gmarkall

Thanks for your reply. I'm enjoying using llvmlite.

I can reproduce this bug. The root cause appears to be that add_incoming() does not invalidate the PHI string cache.

Here is a minimal reproduction script that demonstrates the issue:

Minimal Reproduction (50 lines)

Python

#!/usr/bin/env python3
"""
Minimal reproduction for llvmlite issue #1337
Bug: add_incoming() does not invalidate PHI string cache

pip install llvmlite
python3 llvmlite_issue1337_simple.py
"""
import llvmlite.ir as ir

# Create module
module = ir.Module()
i64 = ir.IntType(64)
func = ir.Function(module, ir.FunctionType(i64, []), name="test")

# Create blocks
entry = func.append_basic_block("entry")
loop = func.append_basic_block("loop")

# Entry block
builder = ir.IRBuilder(entry)
zero = ir.Constant(i64, 0)
builder.branch(loop)

# Loop block with PHI
builder = ir.IRBuilder(loop)
builder.position_at_start(loop)
phi = builder.phi(i64, name="counter")

builder.position_at_end(loop)
one = ir.Constant(i64, 1)
next_val = builder.add(phi, one)
builder.branch(loop)

# BUG: Call str() BEFORE add_incoming() - this caches empty string
print("Before add_incoming():")
print(f"  str(phi) = {str(phi)}")

# Add incoming edges
phi.add_incoming(zero, entry)
phi.add_incoming(next_val, loop)

# BUG: str() returns OLD cached value (empty)
print("\nAfter add_incoming():")
print(f"  str(phi) = {str(phi)}")
print(f"  Expected: phi i64 [ 0, %entry ], [ %next_val, %loop ]")

# PROOF: Clear cache fixes it
phi._clear_string_cache()
print(f"\nAfter _clear_string_cache():")
print(f"  str(phi) = {str(phi)}")

Output

Before add_incoming():
  str(phi) = %"counter" = phi  i64

After add_incoming():
  str(phi) = %"counter" = phi  i64
  Expected: phi i64 [ 0, %entry ], [ %next_val, %loop ]

After _clear_string_cache():
  str(phi) = %"counter" = phi  i64 [0, %"entry"], [%".3", %"loop"]

Explanation

PHI is created (no incomings)

str(phi) is called → empty string is cached

add_incoming() is called → cache NOT invalidated

str(phi) returns old cached string (empty)

Suggested Fix

The add_incoming() method should call self._clear_string_cache() to invalidate the cache.

moe-charm avatar Oct 24 '25 17:10 moe-charm