iree icon indicating copy to clipboard operation
iree copied to clipboard

[WIP] Adding support for opt pass plugins.

Open josemonsalve2 opened this issue 1 year ago • 1 comments

This commit adds the --iree-hal-target-pass-plugins flag That allows to add plugins during executable code generation and serialization.

This is interesting for adding instrumentation via external passes (e.g., https://github.com/CRobeck/instrument-amdgpu-kernels)

I am creating this for two reasons. First, to see if there is interest for this. Second, to get help on an error I have. Currently, I am having some issues with my tests where there is a segfault during dlopen. If anyone has some clue what may be happening, that would be awesome.

josemonsalve2 avatar Aug 24 '24 16:08 josemonsalve2

The PR for getting this into Triton is here:

https://github.com/triton-lang/triton/pull/3953

I would try adding a similar “Hello world” style test like in that commit to make sure everything is hooked up right.

CRobeck avatar Aug 24 '24 20:08 CRobeck

@CRobeck I have added the hello world example. However, I have the following changes.

  1. If I link the plugin against LLVMCore, I get the segfault I was getting. It also throws an error saying I have registered a CLI option twice. IDK what the deal with that is.
  2. I had to add an option to print the info without debugInfo. It does not seem to be included in the IREE compilation pipeline, so it segfaults with the original code in Triton

josemonsalve2 avatar Sep 18 '24 03:09 josemonsalve2

Thanks Ben. I’ll address those comments.

josemonsalve2 avatar Sep 18 '24 06:09 josemonsalve2

@benvanik, I made the changes you requested.

There is one thing I am not too happy about. The output of the lit.site.cfg.py is going to the source folder rather than the build folder. Any suggestions?

josemonsalve2 avatar Sep 25 '24 03:09 josemonsalve2

@kuhar, @CRobeck mentioned you were interested in this.

josemonsalve2 avatar Sep 25 '24 03:09 josemonsalve2

yeah we can't be polluting the source folder - that'll have to change. I'll take a look at the rest in the morning - a quick glance there's some style issues that need to be fixed (lack of end of file lines, lack of proper punctuation in comments, mismatch of cases in variable names, etc) that I can leave comments on if you don't get to them first.

benvanik avatar Sep 25 '24 03:09 benvanik

I’ll get the format things. I just saw the lint and will be following that. No worries about commenting these for now unless there’s something not covered by lint.

josemonsalve2 avatar Sep 25 '24 03:09 josemonsalve2

This looks very useful, @josemonsalve2 and @CRobeck! Left some comments, mostly coding style.

kuhar avatar Sep 25 '24 13:09 kuhar

That's really useful comments, thanks @kuhar! I will address those.

josemonsalve2 avatar Sep 25 '24 16:09 josemonsalve2

I'm trying to reproduce the CI - Linux x64 bazel / linux_x64_bazel (pull_request), but I cannot find the scripts.

./build_tools/bazel/install_bazelisk.sh 1.21.0
cp ./build_tools/scripts/fetch_cuda_deps.sh /usr/local/bin
/usr/local/bin/fetch_cuda_deps.sh ${IREE_CUDA_DEPS_DIR}
./build_tools/bazel/build_test_all.sh

Any hints on how to go about this

josemonsalve2 avatar Sep 25 '24 22:09 josemonsalve2

I'm trying to reproduce the CI - Linux x64 bazel / linux_x64_bazel (pull_request), but I cannot find the scripts.

./build_tools/bazel/install_bazelisk.sh 1.21.0
cp ./build_tools/scripts/fetch_cuda_deps.sh /usr/local/bin
/usr/local/bin/fetch_cuda_deps.sh ${IREE_CUDA_DEPS_DIR}
./build_tools/bazel/build_test_all.sh

Any hints on how to go about this

You shouldn't have to do any of this (except for the last command), I think. Can you reproduce this with the usual bazel setup? https://iree.dev/developers/building/bazel/

kuhar avatar Sep 26 '24 02:09 kuhar

Thank you very much @kuhar for all those great comments

josemonsalve2 avatar Sep 26 '24 16:09 josemonsalve2

@CRobeck @ScottTodd I have changed the name of the library.

The issue with the Asan CI is that the library's output name is not libGPUHello.so but libiree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.so. Thus, the file is not found during the lit run. I do not know how to either force the name not to use the whole path or get the prefix inside of the lit test.

My second issue is that the output of lit.site.cfg.py.in goes to the source tree. I tried changing the output to go to the build tree, but it does not get loaded by lit (since the lit.cfg.py is in a different location). @benvanik does not want this behavior. But I need to learn how to point it to the right place.

josemonsalve2 avatar Sep 26 '24 16:09 josemonsalve2

@ScottTodd, thank you so much for the suggestion on the current environment variables. I did not realize these existed. This also solves the problem of creating files in the source tree.

I am still having problems with the name of the library file changes when building the CI asan. Any suggestions on how to force the library name always to be libGPUHello$PREFIX?

josemonsalve2 avatar Sep 26 '24 21:09 josemonsalve2

I am still having problems with the name of the library file changes when building the CI asan. Any suggestions on how to force the library name always to be libGPUHello$PREFIX?

Check the CMake for the sample I shared: https://github.com/iree-org/iree/blob/d5839584547ec2a258b12399f56ea3d006668dbb/samples/custom_module/dynamic/CMakeLists.txt#L25-L32

ScottTodd avatar Sep 26 '24 21:09 ScottTodd

The PkgCI / Build Packages / Linux Release (x86_64) error is odd to me:

compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o
  FAILED: compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o
  ccache /usr/bin/clang++  -I/home/runner/_work/iree/iree -I/home/runner/_work/iree/iree/compiler/build/b -I/home/runner/_work/iree/iree/third_party/llvm-project/llvm/include -I/home/runner/_work/iree/iree/compiler/build/b/llvm-project/include -I/home/runner/_work/iree/iree/third_party/llvm-project/mlir/include -I/home/runner/_work/iree/iree/compiler/build/b/llvm-project/tools/mlir/include -I/home/runner/_work/iree/iree/third_party/llvm-project/lld/include -I/home/runner/_work/iree/iree/compiler/build/b/llvm-project/tools/lld/include -O3 -DNDEBUG -std=gnu++17 -fPIC -fvisibility=hidden -fno-rtti -fno-exceptions -Werror -Wall -Wno-error=deprecated-declarations -Wno-ambiguous-member-template -Wno-char-subscripts -Wno-extern-c-compat -Wno-gnu-alignof-expression -Wno-gnu-variable-sized-type-not-at-end -Wno-ignored-optimization-argument -Wno-invalid-offsetof -Wno-invalid-source-encoding -Wno-mismatched-tags -Wno-pointer-sign -Wno-reserved-user-defined-literal -Wno-return-type-c-linkage -Wno-self-assign-overloaded -Wno-sign-compare -Wno-signed-unsigned-wchar -Wno-strict-overflow -Wno-trigraphs -Wno-unknown-pragmas -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-unused-const-variable -Wno-unused-function -Wno-unused-local-typedef -Wno-unused-private-field -Wno-user-defined-warnings -Wno-missing-braces -Wctad-maybe-unsupported -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Wimplicit-fallthrough -Winfinite-recursion -Wliteral-conversion -Wnon-virtual-dtor -Woverloaded-virtual -Wpointer-arith -Wself-assign -Wstring-conversion -Wtautological-overlap-compare -Wthread-safety -Wthread-safety-beta -Wunused-comparison -Wvla -fno-lax-vector-conversions -fmacro-prefix-map=/home/runner/_work/iree/iree=iree -pthread -MD -MT compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o -MF compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o.d -o compiler/plugins/target/ROCM/test/opt_pass_plugin/CMakeFiles/iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir/GPUHello.cpp.o -c /home/runner/_work/iree/iree/compiler/plugins/target/ROCM/test/opt_pass_plugin/GPUHello.cpp
  In file included from /home/runner/_work/iree/iree/compiler/plugins/target/ROCM/test/opt_pass_plugin/GPUHello.cpp:7:
  In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/IRBuilder.h:24:
  In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/ConstantFolder.h:21:
  In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/ConstantFold.h:24:
  In file included from /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/InstrTypes.h:24:
  /home/runner/_work/iree/iree/third_party/llvm-project/llvm/include/llvm/IR/Attributes.h:90:14: fatal error: 'llvm/IR/Attributes.inc' file not found
     90 |     #include "llvm/IR/Attributes.inc"
        |              ^~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

This is within the llvm project, I don't see how what I am doing can break this.

josemonsalve2 avatar Sep 26 '24 21:09 josemonsalve2

@josemonsalve2 did you get this figured out?

CRobeck avatar Oct 01 '24 01:10 CRobeck

@josemonsalve2 did you get this figured out?

Corbin, I am still stuck with the same two errors. The asan error is actually on symbol duplications. I tried adding the -fvisibility=default flag to the plugin target, but it still fails the same way. I also used vanilla CMake, as Scott suggested, but I still have the same issue.

I did not have time to play with this today, but I plan to continue on Wed. However, any other suggestions are welcome.

josemonsalve2 avatar Oct 01 '24 04:10 josemonsalve2

I had to disable the test from the ASAN run. There is a odr-violation when the plugin is loaded. I tried multiple things with visibility, but it did not work.

=================================================================
==311843==ERROR: AddressSanitizer: odr-violation (0x7f372aaa31c0):
  [1] size=4 'llvm::sys::fs::kInvalidFile' /home/jmonsalv/develop/dataflow_dependencies/iree/third_party/llvm-project/llvm/lib/Support/Unix/Path.inc:127 in ./lib/libGPUHello.so
  [2] size=4 'llvm::sys::fs::kInvalidFile' /home/jmonsalv/develop/dataflow_dependencies/iree/third_party/llvm-project/llvm/lib/Support/Unix/Path.inc:127 in /home/jmonsalv/develop/dataflow_dependencies/iree/build-asan/lib/libIREECompiler.so
These globals were registered at these points:
  [1]:
    #0 0x556d8cdbec07 in __asan_register_globals.part.0 /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:369:3
    #1 0x556d8cdbf2db in __asan_register_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:55
    #2 0x556d8cdbf2db in __asan_register_elf_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:26
    #3 0x7f3754d1f47d in call_init elf/dl-init.c:70:3

  [2]:
    #0 0x556d8cdbec07 in __asan_register_globals.part.0 /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:369:3
    #1 0x556d8cdbf2db in __asan_register_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:55
    #2 0x556d8cdbf2db in __asan_register_elf_globals /home/jmonsalv/installers/llvm-project/compiler-rt/lib/asan/asan_globals.cpp:352:26
    #3 0x7f3754d1f47d in call_init elf/dl-init.c:70:3

==311843==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'llvm::sys::fs::kInvalidFile' at /home/jmonsalv/develop/dataflow_dependencies/iree/third_party/llvm-project/llvm/lib/Support/Unix/Path.inc:127 in ./lib/libGPUHello.so
==311843==ABORTING

josemonsalve2 avatar Oct 15 '24 17:10 josemonsalve2

Would linking against the iree compiler lib instead of llvmsupport fix it?

kuhar avatar Oct 15 '24 18:10 kuhar

Would linking against the iree compiler lib instead of llvmsupport fix it?

Are you referring to the problem in CI right now, or the problem with the ASAN test?

josemonsalve2 avatar Oct 15 '24 19:10 josemonsalve2

I wonder if we are even fixing the right problem and if linking against the iree compiler is the right thing to do instead of fighting hidden/duplicate symbols etc. I would expect all the llvm libraries to be available as a transitive import.

kuhar avatar Oct 15 '24 20:10 kuhar

I wonder if we are even fixing the right problem and if linking against the iree compiler is the right thing to do instead of fighting hidden/duplicate symbols etc. I would expect all the llvm libraries to be available as a transitive import.

Let me try that out

josemonsalve2 avatar Oct 15 '24 23:10 josemonsalve2

All CIs pass like this, but I want to try @kuhar suggestion. So, I will push the latest that seems to be working on my end. If it works, it is a much cleaner solution.

josemonsalve2 avatar Oct 16 '24 00:10 josemonsalve2

Ok. That solved all the issues. Thanks @kuhar! It should be ready to merge on my end.

josemonsalve2 avatar Oct 16 '24 04:10 josemonsalve2

@benvanik do you see any outstanding issues?

kuhar avatar Oct 16 '24 12:10 kuhar

Thanks, @benvanik. I have addressed those comments.

josemonsalve2 avatar Oct 16 '24 17:10 josemonsalve2

If we keep the dynamic linking, we'll want to test that it works on Windows (the Windows CI only runs nightly / on-demand right now)

This broke the Windows build: https://github.com/iree-org/iree/actions/runs/11382217806/job/31665262820#step:7:9414

[8310/8625] Linking CXX shared library tools\libGPUHello.dll
FAILED: tools/libGPUHello.dll lib/GPUHello.lib 
C:\Windows\system32\cmd.exe /C "C:\Windows\system32\cmd.exe /C ""C:\Program Files\CMake\bin\cmake.exe" -E __create_def D:\a\iree\iree\build-windows\compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def D:\a\iree\iree\build-windows\compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def.objs && cd D:\a\iree\iree\build-windows" && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --intdir=compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1441~1.341\bin\Hostx64\x64\link.exe /nologo compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir\GPUHello.cpp.obj  /out:tools\libGPUHello.dll /implib:lib\GPUHello.lib /pdb:tools\libGPUHello.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL  -natvis:D:/a/iree/iree/runtime/iree.natvis /DEF:compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def  lib\IREECompiler.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK Pass 1: command "C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1441~1.341\bin\Hostx64\x64\link.exe /nologo compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.objects.dir\GPUHello.cpp.obj /out:tools\libGPUHello.dll /implib:lib\GPUHello.lib /pdb:tools\libGPUHello.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:D:/a/iree/iree/runtime/iree.natvis /DEF:compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir\.\exports.def lib\IREECompiler.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir/intermediate.manifest compiler\plugins\target\ROCM\test\opt_pass_plugin\CMakeFiles\iree_compiler_plugins_target_ROCM_test_opt_pass_plugin_GPUHello.dir/manifest.res" failed (exit code 1120) with the following output:
LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored
   Creating library lib\GPUHello.lib and object lib\GPUHello.exp
GPUHello.cpp.obj : error LNK2019: unresolved external symbol "protected: void * __cdecl llvm::SmallVectorBase<unsigned int>::mallocForGrow(void *,unsigned __int64,unsigned __int64,unsigned __int64 &)" (?mallocForGrow@?$SmallVectorBase@I@llvm@@IEAAPEAXPEAX_K1AEA_K@Z) referenced in function "protected: class std::function<void __cdecl(class llvm::PassManager<class llvm::Module,class llvm::AnalysisManager<class llvm::Module> > &,class llvm::OptimizationLevel)> * __cdecl llvm::SmallVectorTemplateBase<class std::function<void __cdecl(class llvm::PassManager<class llvm::Module,class llvm::AnalysisManager<class llvm::Module> > &,class llvm::OptimizationLevel)>,0>::mallocForGrow(unsigned __int64,unsigned __int64 &)" (?mallocForGrow@?$SmallVectorTemplateBase@V?$function@$$A6AXAEAV?$PassManager@VModule@llvm@@V?$AnalysisManager@VModule@llvm@@$$V@2@$$V@llvm@@VOptimizationLevel@2@@Z@std@@$0A@@llvm@@IEAAPEAV?$function@$$A6AXAEAV?$PassManager@VModule@llvm@@V?$AnalysisManager@VModule@llvm@@$$V@2@$$V@llvm@@VOptimizationLevel@2@@Z@std@@_KAEA_K@Z)
GPUHello.cpp.obj : error LNK2019: unresolved external symbol "bool __cdecl llvm::getAsUnsignedInteger(class llvm::StringRef,unsigned int,unsigned __int64 &)" (?getAsUnsignedInteger@llvm@@YA_NVStringRef@1@IAEA_K@Z) referenced in function "public: bool __cdecl llvm::StringRef::getAsInteger<unsigned __int64>(unsigned int,unsigned __int64 &)const " (??$getAsInteger@_K@StringRef@llvm@@QEBA_NIAEA_K@Z)

ScottTodd avatar Oct 17 '24 14:10 ScottTodd

Trying to iterating locally on a fix for the Windows build, but I can't reproduce the linker error.

What I do see is that GPUHello.dll is in iree-build/tools/ since the build target is using iree_cc_library. The test is trying to load from --iree-hip-pass-plugin-path=$IREE_BINARY_DIR/lib/libGPUHello$IREE_DYLIB_EXT, which does not exist (both the name and file path are wrong).

Could disable the test on Windows or revert this... need the Windows build green again (it's been broken for almost 3 weeks - I had a few fixes land but this compounded on top of them).

ScottTodd avatar Oct 17 '24 16:10 ScottTodd

yeah, that path is wrong - no clue why /lib/ is in there, but the lib prefix is something we need to drop - here's what we do elsewhere:

set_target_properties(iree_samples_custom_dispatch_cpu_mlp_plugin
  PROPERTIES
    WINDOWS_EXPORT_ALL_SYMBOLS ON
    PREFIX ""
    OUTPUT_NAME "GPUHello"
)

(I think "gpuhello" could use some work too - this should be ROCMPluginTest or something)

benvanik avatar Oct 17 '24 16:10 benvanik