llvm-bitcode Add tests that integrate with `clang -emit-llvm` output

Add tests that call clang -emit-llvm -c somefile.c to produce a .bc bitcode file, then parse that file with this library. This test can be orchestrated by build.zig.

It would be cool to do some fuzz testing by generating several C files of various sizes using https://github.com/travisstaloch/markov-chains-zig.

Dec 19 '22 04:12 hryx

sounds like you're trying to generate valid c code? if that is the case, the markov chain lib will most likely not be able to do so. one thing you could try is using a large -Dblock-len. block len affects how closely the generated code follows input.

so maybe try running it like so:

zig build run -Dblock-len=24 -- --start-block "int main" --maxlen 1000 $(find path/to/lots/of/c/files -name "*.c")

this will spit out 1000 characters of c-looking text starting with "int main". the default block len is 8 so 24 might be closer to valid - although i am doubtful it could produce anything actually valid.

Dec 19 '22 14:12 travisstaloch

That's insightful, thank you. I wasn't sure if the idea was valid, and sounds like it might not be the right tool for the job (though I'm still looking forward to trying out your library!)

I think there are two aspects I would like to cover:

Using a wide variety of language features, which in turn produces a wide variety of bitcode structures/values
Testing unusually large files

For the latter, I could just do more straightforward procedural code generation. For the former, probably a single hand-written C/C++ file that uses as many language features as I can think of.

Dec 19 '22 21:12 hryx

let me know how it goes. if nothing else the markov chain might be helpful in 1. for coming up with random inputs that you'd need to edit. though i'm sure there are better alternatives out there for automated generating valid c code.

Dec 19 '22 22:12 travisstaloch