xe: sdpa: Select SDPA config using scoring criteria
Description
This PR adds a scoring function to the SDPA primitive that to identify the correct config given an input query. This pull request refactors the configuration selection logic for the SDPA microkernels, improving the configurability of kernel selection. The changes introduce a new scoring-based search for configuration records, update the configuration data, and simplify the interface for querying sequence intervals. These updates help ensure the most appropriate kernel configuration is selected for a given problem and make the codebase easier to maintain.
Configuration Selection Logic Improvements
- Replaced the direct search for a matching configuration with a new
search_queryfunction that uses a scoring system to select the best configuration record based on query properties and sequence length. This allows for more flexible and accurate matching. (src/gpu/intel/sdpa/configs.cpp,src/gpu/intel/sdpa/configs.hpp) [1] [2] [3]
Configuration Data Updates
- Added and updated several configuration records for
xe_hpcandxe2architectures, including new combinations forfma,integrated, andquantizedproperties, and adjusted some kernel parameters for better coverage and tuning. (src/gpu/intel/sdpa/configs.cpp) [1] [2] [3] [4] [5] [6]
Code Interface Changes
- Changed the return type of
choose_configandnearest_conf_seq_intervalfrom returning just the configuration to returning the full configuration record, allowing access to both criteria and config details. Updated all call sites to use the new interface. (src/gpu/intel/sdpa/configs.cpp,src/gpu/intel/sdpa/configs.hpp,src/gpu/intel/sdpa/micro.cpp) [1] [2] [3]
Output and Debugging Enhancements
- Improved the formatting of configuration output strings for easier reading and debugging, including aligned fields and more consistent property indicators. (
src/gpu/intel/sdpa/configs.cpp) [1] [2]
Matching and Sorting Logic Simplification
- Simplified the criteria matching and sorting logic to make configuration selection more predictable and maintainable, removing less useful heuristics and favoring property-based ordering. (
src/gpu/intel/sdpa/configs.cpp)
make test disable benchdnn_all set test_scope=NIGHTLY enable benchdnn_graph disable test_device_cpu enable test_device_gpu enable arch_gpu_xe-hpc enable arch_gpu_xe-hpg-atsm enable arch_gpu_xe-hpg-dg2 enable arch_gpu_xe-lp enable arch_gpu_xe-lpg enable arch_gpu_xe-lpg+ enable arch_gpu_xe2-hpg-bmg enable arch_gpu_xe2-lpg
make test perf-gpu set primitive=sdpa
make test disable benchdnn_all set test_scope=NIGHTLY enable benchdnn_graph disable test_device_cpu enable test_device_gpu enable arch_gpu_xe-hpc enable arch_gpu_xe-hpg-atsm enable arch_gpu_xe-hpg-dg2 enable arch_gpu_xe-lp enable arch_gpu_xe-lpg enable arch_gpu_xe-lpg+ enable arch_gpu_xe2-hpg-bmg enable arch_gpu_xe2-lpg
make test perf-gpu set primitive=sdpa
make test disable benchdnn_all set test_scope=NIGHTLY enable benchdnn_graph disable test_device_cpu enable test_device_gpu enable arch_gpu_xe-hpc enable arch_gpu_xe-hpg-atsm enable arch_gpu_xe-hpg-dg2 enable arch_gpu_xe-lp enable arch_gpu_xe-lpg enable arch_gpu_xe-lpg+ enable arch_gpu_xe2-hpg-bmg enable arch_gpu_xe2-lpg
make test perf-gpu set primitive=sdpa