tvm [Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features

Currently, target features are determined by a set of fixed checks on the target string. This works well for checking support of a small number of simple features, but it doesn't scale. Some problems include:

There are many non-trivial conditions for which a feature may(not) be available. It is easy to miss these with the current implementation.
The inclusion of some features in a target string can imply other features. For example, "+sve2" implies "+sve". This currently isn't taken into account.
The tests in tests/cpp/target/parsers/aprofile_test.c suggest that targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon" are supported target strings. The features will be correctly parsed in TVM, however, they are not valid in LLVM. Therefore, it's possible that TVM and LLVM have different understanding of the features available.

This commit uses the more robust LLVM target parser to determine support for the features in TVM. It leverages previous infrastructure added to TVM for obtaining a list of all supported features given an input target, and uses this to check the existance of certain features we're interested in. It should be trivial to grow this list over time. As a result of this change, the problems mentioned above are solved.

In the current form, this commit drops support for target strings such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the codebase suggests this functionality is not in use (only in test cases). Should we feel the need to support them, or have a smoother migration for downstream users of TVM we can add a translator to the parser to convert these into LLVM compatible targets.

cc @Mousius @cbalint13 @ekalda @neildhickey

Jan 18 '24 09:01 lhutton1

Hi @lhutton1 !

Thanks a lot for picking up the recently introduce llvm reflection (well, kind of) for ARM targets too !

Currently, target features are determined by a set of fixed checks on the target string. This works well for checking support of a small number of simple features, but it doesn't scale. Some problems include:

There are many non-trivial conditions for which a feature may(not) be available. It is easy to miss these with the current implementation.

The inclusion of some features in a target string can imply other features. For example, "+sve2" implies "+sve". This currently isn't taken into account.

The tests in tests/cpp/target/parsers/aprofile_test.c suggest that targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon" are supported target strings. The features will be correctly parsed in TVM, however, they are not valid in LLVM. Therefore, it's possible that TVM and LLVM have different understanding of the features available. This commit uses the more robust LLVM target parser to determine support for the features in TVM. It leverages previous infrastructure added to TVM for obtaining a list of all supported features given an input target, and uses this to check the existance of certain features we're interested in. It should be trivial to grow this list over time. As a result of this change, the problems mentioned above are solved.

Yes I agree, I was also concerned about, all the mentioned target strings are "non-legit" from llvm point of view.

In the current form, this commit drops support for target strings such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the codebase suggests this functionality is not in use (only in test cases). Should we feel the need to support them, or have a smoother migration for downstream users of TVM we can add a translator to the parser to convert these into LLVM compatible targets.

Addition or subtraction can be done with +feat or -feat, a.f.a.i.k. there is no such thing as +no<whatever> in llvm.

I salute this PR, nice work @lhutton1 !

Jan 18 '24 11:01 cbalint13

also cc @kparzysz-quic

Feb 02 '24 14:02 lhutton1

friendly ping on this

Feb 19 '24 09:02 lhutton1

@tvm-bot rerun

Mar 21 '24 09:03 lhutton1

the duplication of the ci-arm CI jobs still appears to be occurring - trying a force push to see if that helps

Mar 21 '24 13:03 lhutton1

It seems CI was failing due to a memory leak observed when calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches(). The following is a valgrind report for an executable that creates a new target tvm::Target("llvm -mtriple=aarch64-linux-gnu") with this PR's changes:

...
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,621 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC41E09: tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() (llvm_instance.cc:843)
==356347==    by 0xBC3D5CA: tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> const&) (llvm_instance.cc:220)
==356347==    by 0xAD6FCE8: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:101)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347== 
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,622 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC4217B: tvm::codegen::LLVMTargetInfo::GetAllLLVMCpuFeatures() (llvm_instance.cc:868)
==356347==    by 0xAD6FD01: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:102)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347==    by 0xAE8D453: CallPacked (packed_func.h:1256)
==356347==    by 0xAE8D453: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> >(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>&&) const (packed_func.h:1784)
...

The problem seems to come from GetLLVMSubtargetInfo(...) which creates a target machine instance to get a pointer to an MCSubtargetInfo object. The reference to the created target machine is lost and the memory is not freed. I've tried to naively rework the code to remove the GetLLVMSubtargetInfo(...) function and therefore avoid the leak, but happy to hear any better ideas (cc @cbalint13, @kparzysz)

The reason for this surfacing only now (after previous successful CI runs), is that #16513 was merged, which means the changes in this PR are now run much more frequently in CI.

Mar 25 '24 22:03 lhutton1

It seems CI was failing due to a memory leak observed when calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches(). The following is a valgrind report for an executable that creates a new target tvm::Target("llvm -mtriple=aarch64-linux-gnu") with this PR's changes:

...
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,621 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC41E09: tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() (llvm_instance.cc:843)
==356347==    by 0xBC3D5CA: tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> const&) (llvm_instance.cc:220)
==356347==    by 0xAD6FCE8: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:101)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347== 
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,622 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC4217B: tvm::codegen::LLVMTargetInfo::GetAllLLVMCpuFeatures() (llvm_instance.cc:868)
==356347==    by 0xAD6FD01: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:102)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347==    by 0xAE8D453: CallPacked (packed_func.h:1256)
==356347==    by 0xAE8D453: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> >(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>&&) const (packed_func.h:1784)
...

The problem seems to come from GetLLVMSubtargetInfo(...) which creates a target machine instance to get a pointer to an MCSubtargetInfo object. The reference to the created target machine is lost and the memory is not freed. I've tried to naively rework the code to remove the GetLLVMSubtargetInfo(...) function and therefore avoid the leak, but happy to hear any better ideas (cc @cbalint13, @kparzysz)

Is this happening with llvm18 too ?
I look into this (across multiple llvm versions), could help me with a simple testcase ?

Mar 25 '24 22:03 cbalint13

Thanks for the quick response @cbalint13! I didn't try with llvm18 yet, only llvm17. Calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches() should reproduce it, but I'll be able to come up with a more concrete example tomorrow

Mar 25 '24 22:03 lhutton1

Thanks for the quick response @cbalint13! I didn't try with llvm18 yet, only llvm17. Calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches() should reproduce it, but I'll be able to come up with a more concrete example tomorrow

I'l test it for llvm18 too, just ping me if you have a concrete sample, until than I try invoking as you said (hope to catch it).

Mar 25 '24 22:03 cbalint13

Here is a reproducer: mem_leak.cpp

#include "tvm/runtime/registry.h"
#include "tvm/target/target.h"

int main() {
  auto pf = tvm::runtime::Registry::Get("target.llvm_get_cpu_archlist");
  (*pf)(tvm::Target("llvm"));
}

Compile:

g++ -std=c++17 -O2 -fPIC -I{TVM_DIR}/include -I{TVM_DIR}/3rdparty/dmlc-core/include -I{TVM_DIR}/tvm/3rdparty/dlpack/include -DDMLC_USE_LOGGING_LIBRARY=\<tvm/runtime/logging.h\> -o mem_leak_exec mem_leak.cpp -L{TVM_BUILD_DIR} -ldl -ltvm -pthread

Run with valgrind:

LD_PRELOAD="{TVM_BUILD_DIR}/libtvm.so" valgrind --leak-check=full -v --track-origins=yes ./mem_leak_exec

Output:

...
==475237== 12,369 (1,560 direct, 10,809 indirect) bytes in 1 blocks are definitely lost in loss record 42,596 of 42,630
==475237==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==475237==    by 0x12244479: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==475237==    by 0xBC0131B: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==475237==    by 0xBBFBC05: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:393)
==475237==    by 0xBBFBD8A: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:408)
==475237==    by 0xBBFEAE1: tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() const (llvm_instance.cc:835)
==475237==    by 0xBBFA2BB: tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, tvm::Target const&) (llvm_instance.cc:218)
==475237==    by 0xBC0EE16: tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}::operator()(tvm::Target const) const (llvm_module.cc:695)
==475237==    by 0xBC188B0: tvm::runtime::TypedPackedFunc<tvm::runtime::Array<tvm::runtime::String, void> (tvm::Target const&)>::AssignTypedLambda<tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}>(tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const (packed_func.h:1826)
==475237==    by 0xBC233EE: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Array<tvm::runtime::String, void> (tvm::Target const&)>::AssignTypedLambda<tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}>(tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue) (packed_func.h:1252)
...

Mar 26 '24 09:03 lhutton1

Here is a reproducer: mem_leak.cpp

Thanks a lot for this, I start to look at it now.

Mar 26 '24 09:03 cbalint13

Here is a reproducer: mem_leak.cpp

Thanks a lot for this, I start to look at it now.

@lhutton1 ,

Here is a patch: tvm-llvm-memleak.diff.gz Can confirm that on your side is fine ?

Later EDIT: the patch is against vanilla tvm latest.

Mar 26 '24 17:03 cbalint13

Here is a reproducer: mem_leak.cpp

Thanks a lot for this, I start to look at it now.

@lhutton1 ,

Here is a patch: tvm-llvm-memleak.diff.gz Can confirm that on your side is fine ?

Later EDIT: the patch is against vanilla tvm latest.

@lhutton1 ,

Few comments on the fix here:

First these LLVM module interrogation functions creates it's own sessions independent from the main instance.
It seems that within same context (the LLVMTargetInfo class) there will be multiple LLVM instances going memleak

I wanted these functions as separate class apart from TVMTargetInfo but was a requirement at review time of the PR

Your fix is not naive at all: https://github.com/lhutton1/tvm/commit/31927904b378c142be3044419e38501425805cd4
- You force creation within smart pointers until the needed SubtargetInfo interface, this is a effective way IMO
- Protecting the TargetMachine(s) is also in line with the original design of @kparzysz
My fix only make sure to re-use the llvm_instance + llvm_target_machine using existing GetOrCreate* interface.
- But we loose now the constness of the querying functions here
- All query parameters (tripe_, cpu_, attrs_) now must be passed once at LLVMTargetInfo creation time

I honestly would go with your #1 , because it keeps the constness and the parameters independence of querying.

Mar 27 '24 08:03 cbalint13

Thanks @cbalint13 I can confirm your fix also worked at my end. Happy to go with my changes, your reasoning makes sense to me :)

Mar 27 '24 09:03 lhutton1

This PR causes spurious error messages to be printed when loading libtvm.so, when the local version of LLVM available does not support one of the CPU architectures defined in tag.cc. For example, the following error messages are printed when LLVM 10.0 is used.

Mar 27 '24 16:03 Lunderberg

This PR causes spurious error messages to be printed when loading libtvm.so, when the local version of LLVM available does not support one of the CPU architectures defined in tag.cc. For example, the following error messages are printed when LLVM 10.0 is used.

Hi @Lunderberg ,

The messages looks legit, LLVM10 not support those CPU invocations.

Possible enhancements here:

The message maybe could be more explicit by adding "using LLVM version X.Y" like e.g.: LLVM cpu architecture -mcpu=cortex-a78 is not valid in -mtriple=aarch64{...} using LLVM version X.Y
Maybe also we could issue "WARN" instead of "ERROR" level here, but the user should be still notified somehow.

This behaviour was intentionate and introduced here @ PR#15761 With this #1571 TVM has the ability to lookup into LLVM's internal catalog of arches, cpus and their exact features.

Later EDIT:

Prior to this PR (ARM related) and [PR#15761 & PR15685 (x86 related)] all things was "hand mapped/coded"
Let me give an example on how decision was made prior to these enlisted here: x86 cpu features map

We have direct LLVM awareness, not needing any hardcoded mappings into static lists (unmaintainable IMHO)

Mar 27 '24 17:03 cbalint13

The messages look reasonable, based on the available support, but I think they shouldn't be emitted unless the user is attempting to use the invalid target, or making an explicit query of the target parameters. We should not producing an error message when importing TVM.

Regarding changing from a warning to an error, even commenting-out the LOG(ERROR) entirely does not fully remove the error messages, as some are produced from LLVM internally.

Mar 27 '24 17:03 Lunderberg

I agree on the use of internal architectures being preferable to hard-coded lists. The tags were primarily introduced (IIRC) to handle cases where there is no internal architecture that could be queried, such as GPUs with no readily available table to look up, and with limited availability where the compilation shouldn't require queries to a local GPU.

Mar 27 '24 17:03 Lunderberg

Thanks for the discussion @Lunderberg @cbalint13. I agree that we shouldn't remove the error message completely. Just thinking out loud - the problem here seems to be that the targets registered in tag.cc are parsed when loading tvm, is it possible to defer parsing of these registered targets to when they are actually used by the user?

Mar 27 '24 17:03 lhutton1

@Lunderberg ,

We should not producing an error message when importing TVM.

Can give me a script line how to reproduce (beside LLVM10 presence) ? Now that's sounds odd, if true I address a fix to this, will test all iterations llvm=range(10,19)

@lhutton1 ,

Thanks for the discussion @Lunderberg @cbalint13. I agree that we shouldn't remove the error message completely. Just thinking out loud - the problem here seems to be that the targets registered in tag.cc are parsed when loading tvm, is it possible to defer parsing of these registered targets to when they are actually used by the user?

If turns true, we should take this one check out from LLVTargetInfo constructor and see a better place for it.

I take care of this, if you don't mind this it will be a new PR.

Mar 27 '24 17:03 cbalint13

Can give me a script line how to reproduce (beside LLVM10 presence) ? Now that's sounds odd, if true I address a fix to this, will test all iterations llvm=range(10,19)

@cbalint13 If you add a new tag to tag.cc with the "mcpu" host attribute set to a non-existent type (or edit an existing tag), you can reproduce the error with a newer LLVM version.

I take care of this, if you don't mind this it will be a new PR.

Thank you, and making a new PR would be perfect!

Mar 27 '24 17:03 Lunderberg

I am also getting additional errors like

python test.py 
[15:23:28] /home/tqchen/github/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[15:23:28] /home/tqchen/github/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.

this is on a LLVM that was built for rocm (AMD platform). We should not send out an error message during static loading time if ARM target is not used, and only have such error message when we attempt to use tags in aprofile.

One possible approach is to conditionally register the related tags based on availability of related function

Apr 16 '24 19:04 tqchen

tvm tvm copied to clipboard

[Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features

tvm
tvm copied to clipboard