Halide
Halide copied to clipboard
[AMX] syscall to enable AMX instructions on real hardware
The current work on AMX with #6582 has made it possible to generate AMX instructions for various tile allocations on the emulator. However for the real hardware there's a syscall required to enable the feature in the kernel, I've used the following (5.16 is the lowest kernel version supported) as inspiration and tested the result on intel's devcloud.
A few concerns I have:
- Currently injecting the
syscallwill enable the instructions on real hardware but trigger a fault in the emulator.- Would it advisable to add another target feature for the emulator (sde), so it's possible to disable this
syscall?
- Would it advisable to add another target feature for the emulator (sde), so it's possible to disable this
- I'm not sure if I handled the AMX feature detection correctly
- Could it be that this should be part of
src/runtime/x86_cpu_features.cpp?
- Could it be that this should be part of
- Theoretically a user could disable AMX support at runtime making the CpuFeatures no longer a static truth like assumed in
halide_default_can_use_target_features.- We could assume nobody will realistically enable/disable the feature during runtime of a program.
- Disable caching of the
CpuFeaturesif the CPU could possibly support AMX.
Looking forward to any feedback.
This seems to be a syscall that you make once and it enables AMX usage for the entire process? I'm not sure if Halide should be responsible for that if so. If it were scoped and you're supposed to enable and disable AMX access around every use of it, then it would make sense to inject that into our generated code, but without scoping, shouldn't the user be responsible for making sure their process has access to AMX? We could perhaps provide a user-callable function for it in the runtime module.
The reason I'm wary is that it looks like this will only work on linux x86-64 (e.g. it hardcodes a syscall number), and it will actively fail on windows, macos, etc. Aren't people going to use AMX on Windows boxes at some point?
I totally agree with Andrew's request for more info on how this works. To clarify the Linux specific nature of this, the syscall being Linux only is not a show stopper, but the code does need to indicate that this is the case and if other systems require similar system calls, we would likely want to mirror the support for those platforms. (Or at least setup the filenaming and such to allow for it.) E.g. see how halide_host_cpu_count works.
It is correct that only need to perform the syscall once to enable AMX for the entire process. I agree that it should be called by the user then. I'll work on scoping the code correctly to x86_64 linux.
You might need to add a reference to it in the list in src/runtime/runtime_api.cpp to stop it from getting dead-stripped when not linking a standalone runtime.