[WIP] Adding support for opt pass plugins.
This commit adds the --iree-hal-target-pass-plugins flag That allows to add plugins during executable code generation and serialization.
This is interesting for adding instrumentation via external passes (e.g., https://github.com/CRobeck/instrument-amdgpu-kernels)
I am creating this for two reasons. First, to see if there is interest for this. Second, to get help on an error I have. Currently, I am having some issues with my tests where there is a segfault during dlopen. If anyone has some clue what may be happening, that would be awesome.
The PR for getting this into Triton is here:
https://github.com/triton-lang/triton/pull/3953
I would try adding a similar “Hello world” style test like in that commit to make sure everything is hooked up right.
@CRobeck I have added the hello world example. However, I have the following changes.
- If I link the plugin against LLVMCore, I get the segfault I was getting. It also throws an error saying I have registered a CLI option twice. IDK what the deal with that is.
- I had to add an option to print the info without debugInfo. It does not seem to be included in the IREE compilation pipeline, so it segfaults with the original code in Triton
Thanks Ben. I’ll address those comments.
@benvanik, I made the changes you requested.
There is one thing I am not too happy about. The output of the lit.site.cfg.py is going to the source folder rather than the build folder. Any suggestions?
@kuhar, @CRobeck mentioned you were interested in this.
yeah we can't be polluting the source folder - that'll have to change. I'll take a look at the rest in the morning - a quick glance there's some style issues that need to be fixed (lack of end of file lines, lack of proper punctuation in comments, mismatch of cases in variable names, etc) that I can leave comments on if you don't get to them first.
I’ll get the format things. I just saw the lint and will be following that. No worries about commenting these for now unless there’s something not covered by lint.
This looks very useful, @josemonsalve2 and @CRobeck! Left some comments, mostly coding style.
That's really useful comments, thanks @kuhar! I will address those.
I'm trying to reproduce the CI - Linux x64 bazel / linux_x64_bazel (pull_request), but I cannot find the scripts.
./build_tools/bazel/install_bazelisk.sh 1.21.0
cp ./build_tools/scripts/fetch_cuda_deps.sh /usr/local/bin
/usr/local/bin/fetch_cuda_deps.sh ${IREE_CUDA_DEPS_DIR}
./build_tools/bazel/build_test_all.sh
Any hints on how to go about this
I'm trying to reproduce the CI - Linux x64 bazel / linux_x64_bazel (pull_request), but I cannot find the scripts.
./build_tools/bazel/install_bazelisk.sh 1.21.0 cp ./build_tools/scripts/fetch_cuda_deps.sh /usr/local/bin /usr/local/bin/fetch_cuda_deps.sh ${IREE_CUDA_DEPS_DIR} ./build_tools/bazel/build_test_all.shAny hints on how to go about this
You shouldn't have to do any of this (except for the last command), I think. Can you reproduce this with the usual bazel setup? https://iree.dev/developers/building/bazel/
Thank you very much @kuhar for all those great comments
@CRobeck @ScottTodd I have changed the name of the library.
The issue with the Asan CI is that the library's output name is not libGPUHello.so but libiree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.so. Thus, the file is not found during the lit run. I do not know how to either force the name not to use the whole path or get the prefix inside of the lit test.
My second issue is that the output of lit.site.cfg.py.in goes to the source tree. I tried changing the output to go to the build tree, but it does not get loaded by lit (since the lit.cfg.py is in a different location). @benvanik does not want this behavior. But I need to learn how to point it to the right place.
@ScottTodd, thank you so much for the suggestion on the current environment variables. I did not realize these existed. This also solves the problem of creating files in the source tree.
I am still having problems with the name of the library file changes when building the CI asan.
Any suggestions on how to force the library name always to be libGPUHello$PREFIX?
I am still having problems with the name of the library file changes when building the CI asan. Any suggestions on how to force the library name always to be
libGPUHello$PREFIX?
Check the CMake for the sample I shared: https://github.com/iree-org/iree/blob/d5839584547ec2a258b12399f56ea3d006668dbb/samples/custom_module/dynamic/CMakeLists.txt#L25-L32
The PkgCI / Build Packages / Linux Release (x86_64) error is odd to me:
compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o
FAILED: compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o
ccache /usr/bin/clang++ -I/home/runner/_work/iree/iree -I/home/runner/_work/iree/iree/compiler/build/b -I/home/runner/_work/iree/iree/third_party/llvm-project/llvm/include -I/home/runner/_work/iree/iree/compiler/build/b/llvm-project/include -I/home/runner/_work/iree/iree/third_party/llvm-project/mlir/include -I/home/runner/_work/iree/iree/compiler/build/b/llvm-project/tools/mlir/include -I/home/runner/_work/iree/iree/third_party/llvm-project/lld/include -I/home/runner/_work/iree/iree/compiler/build/b/llvm-project/tools/lld/include -O3 -DNDEBUG -std=gnu++17 -fPIC -fvisibility=hidden -fno-rtti -fno-exceptions -Werror -Wall -Wno-error=deprecated-declarations -Wno-ambiguous-member-template -Wno-char-subscripts -Wno-extern-c-compat -Wno-gnu-alignof-expression -Wno-gnu-variable-sized-type-not-at-end -Wno-ignored-optimization-argument -Wno-invalid-offsetof -Wno-invalid-source-encoding -Wno-mismatched-tags -Wno-pointer-sign -Wno-reserved-user-defined-literal -Wno-return-type-c-linkage -Wno-self-assign-overloaded -Wno-sign-compare -Wno-signed-unsigned-wchar -Wno-strict-overflow -Wno-trigraphs -Wno-unknown-pragmas -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-unused-const-variable -Wno-unused-function -Wno-unused-local-typedef -Wno-unused-private-field -Wno-user-defined-warnings -Wno-missing-braces -Wctad-maybe-unsupported -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Wimplicit-fallthrough -Winfinite-recursion -Wliteral-conversion -Wnon-virtual-dtor -Woverloaded-virtual -Wpointer-arith -Wself-assign -Wstring-conversion -Wtautological-overlap-compare -Wthread-safety -Wthread-safety-beta -Wunused-comparison -Wvla -fno-lax-vector-conversions -fmacro-prefix-map=/home/runner/_work/iree/iree=iree -pthread -MD -MT compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o -MF compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o.d -o compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o -c /home/runner/_work/iree/iree/compiler/plugins/target/ROCM/test/opt_pass_plugin/GPUHello.cpp
In file included from /home/runner/_work/iree/iree/compiler/plugins/target/ROCM/test/opt_pass_plugin/GPUHello.cpp:7:
In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/IRBuilder.h:24:
In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/ConstantFolder.h:21:
In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/ConstantFold.h:24:
In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/InstrTypes.h:24:
/home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/Attributes.h:90:14: fatal error: 'llvm/IR/Attributes.inc' file not found
90 | #include "llvm/IR/Attributes.inc"
| ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
This is within the llvm project, I don't see how what I am doing can break this.
@josemonsalve2 did you get this figured out?
@josemonsalve2 did you get this figured out?
Corbin, I am still stuck with the same two errors.
The asan error is actually on symbol duplications. I tried adding the -fvisibility=default flag to the plugin target, but it still fails the same way. I also used vanilla CMake, as Scott suggested, but I still have the same issue.
I did not have time to play with this today, but I plan to continue on Wed. However, any other suggestions are welcome.
I had to disable the test from the ASAN run. There is a odr-violation when the plugin is loaded. I tried multiple things with visibility, but it did not work.
=================================================================
==311843==ERROR: AddressSanitizer: odr-violation (0x7f372aaa31c0):
[1] size=4 'llvm::sys::fs::kInvalidFile' /home/jmonsalv/develop/dataflow_dependencies/iree/third_party/llvm-project/llvm/lib/Support/Unix/Path.inc:127 in ./lib/libGPUHello.so
[2] size=4 'llvm::sys::fs::kInvalidFile' /home/jmonsalv/develop/dataflow_dependencies/iree/third_party/llvm-project/llvm/lib/Support/Unix/Path.inc:127 in /home/jmonsalv/develop/dataflow_dependencies/iree/build-asan/lib/libIREECompiler.so
These globals were registered at these points:
[1]:
#0 0x556d8cdbec07 in __asan_register_globals.part.0 /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:369:3
#1 0x556d8cdbf2db in __asan_register_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:55
#2 0x556d8cdbf2db in __asan_register_elf_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:26
#3 0x7f3754d1f47d in call_init elf/dl-init.c:70:3
[2]:
#0 0x556d8cdbec07 in __asan_register_globals.part.0 /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:369:3
#1 0x556d8cdbf2db in __asan_register_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:55
#2 0x556d8cdbf2db in __asan_register_elf_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:26
#3 0x7f3754d1f47d in call_init elf/dl-init.c:70:3
==311843==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'llvm::sys::fs::kInvalidFile' at /home/jmonsalv/develop/dataflow_dependencies/iree/third_party/llvm-project/llvm/lib/Support/Unix/Path.inc:127 in ./lib/libGPUHello.so
==311843==ABORTING
Would linking against the iree compiler lib instead of llvmsupport fix it?
Would linking against the iree compiler lib instead of llvmsupport fix it?
Are you referring to the problem in CI right now, or the problem with the ASAN test?
I wonder if we are even fixing the right problem and if linking against the iree compiler is the right thing to do instead of fighting hidden/duplicate symbols etc. I would expect all the llvm libraries to be available as a transitive import.
I wonder if we are even fixing the right problem and if linking against the iree compiler is the right thing to do instead of fighting hidden/duplicate symbols etc. I would expect all the llvm libraries to be available as a transitive import.
Let me try that out
All CIs pass like this, but I want to try @kuhar suggestion. So, I will push the latest that seems to be working on my end. If it works, it is a much cleaner solution.
Ok. That solved all the issues. Thanks @kuhar! It should be ready to merge on my end.
@benvanik do you see any outstanding issues?
Thanks, @benvanik. I have addressed those comments.
If we keep the dynamic linking, we'll want to test that it works on Windows (the Windows CI only runs nightly / on-demand right now)
This broke the Windows build: https://github.com/iree-org/iree/actions/runs/11382217806/job/31665262820#step:7:9414
[8310/8625] Linking CXX shared library tools\libGPUHello.dll
FAILED: tools/libGPUHello.dll lib/GPUHello.lib
C:\Windows\system32\cmd.exe /C "C:\Windows\system32\cmd.exe /C ""C:\Program Files\CMake\bin\cmake.exe" -E __create_def D:\a\iree\iree\build-windows\compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def D:\a\iree\iree\build-windows\compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def.objs && cd D:\a\iree\iree\build-windows" && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --intdir=compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests -- C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1441~1.341\bin\Hostx64\x64\link.exe /nologo compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir\GPUHello.cpp.obj /out:tools\libGPUHello.dll /implib:lib\GPUHello.lib /pdb:tools\libGPUHello.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:D:/a/iree/iree/runtime/iree.natvis /DEF:compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def lib\IREECompiler.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK Pass 1: command "C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1441~1.341\bin\Hostx64\x64\link.exe /nologo compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir\GPUHello.cpp.obj /out:tools\libGPUHello.dll /implib:lib\GPUHello.lib /pdb:tools\libGPUHello.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:D:/a/iree/iree/runtime/iree.natvis /DEF:compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def lib\IREECompiler.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir/intermediate.manifest compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir/manifest.res" failed (exit code 1120) with the following output:
LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored
Creating library lib\GPUHello.lib and object lib\GPUHello.exp
GPUHello.cpp.obj : error LNK2019: unresolved external symbol "protected: void * __cdecl llvm::SmallVectorBase<unsigned int>::mallocForGrow(void *,unsigned __int64,unsigned __int64,unsigned __int64 &)" (?mallocForGrow@?$SmallVectorBase@I@llvm@@IEAAPEAXPEAX_K1AEA_K@Z) referenced in function "protected: class std::function<void __cdecl(class llvm::PassManager<class llvm::Module,class llvm::AnalysisManager<class llvm::Module> > &,class llvm::OptimizationLevel)> * __cdecl llvm::SmallVectorTemplateBase<class std::function<void __cdecl(class llvm::PassManager<class llvm::Module,class llvm::AnalysisManager<class llvm::Module> > &,class llvm::OptimizationLevel)>,0>::mallocForGrow(unsigned __int64,unsigned __int64 &)" (?mallocForGrow@?$SmallVectorTemplateBase@V?$function@$$A6AXAEAV?$PassManager@VModule@llvm@@V?$AnalysisManager@VModule@llvm@@$$V@2@$$V@llvm@@VOptimizationLevel@2@@Z@std@@$0A@@llvm@@IEAAPEAV?$function@$$A6AXAEAV?$PassManager@VModule@llvm@@V?$AnalysisManager@VModule@llvm@@$$V@2@$$V@llvm@@VOptimizationLevel@2@@Z@std@@_KAEA_K@Z)
GPUHello.cpp.obj : error LNK2019: unresolved external symbol "bool __cdecl llvm::getAsUnsignedInteger(class llvm::StringRef,unsigned int,unsigned __int64 &)" (?getAsUnsignedInteger@llvm@@YA_NVStringRef@1@IAEA_K@Z) referenced in function "public: bool __cdecl llvm::StringRef::getAsInteger<unsigned __int64>(unsigned int,unsigned __int64 &)const " (??$getAsInteger@_K@StringRef@llvm@@QEBA_NIAEA_K@Z)
Trying to iterating locally on a fix for the Windows build, but I can't reproduce the linker error.
What I do see is that GPUHello.dll is in iree-build/tools/ since the build target is using iree_cc_library. The test is trying to load from --iree-hip-pass-plugin-path=$IREE_BINARY_DIR/lib/libGPUHello$IREE_DYLIB_EXT, which does not exist (both the name and file path are wrong).
Could disable the test on Windows or revert this... need the Windows build green again (it's been broken for almost 3 weeks - I had a few fixes land but this compounded on top of them).
yeah, that path is wrong - no clue why /lib/ is in there, but the lib prefix is something we need to drop - here's what we do elsewhere:
set_target_properties(iree_samples_custom_dispatch_cpu_mlp_plugin
PROPERTIES
WINDOWS_EXPORT_ALL_SYMBOLS ON
PREFIX ""
OUTPUT_NAME "GPUHello"
)
(I think "gpuhello" could use some work too - this should be ROCMPluginTest or something)