Results 113 comments of aneshlya

LLVM patch: https://github.com/llvm/llvm-project/pull/166227

Thanks for the idea! It would require introducing a new type in the ISPC type system and implementing all applicable stdlib functions for it. The work isn’t particularly complicated, but...

Before implementing any changes requested in the review could you post an updated performance table without warm_up results and with sse4/avx1 targets?

Please don't introduce any new capabilities. The x86 targets are organized in hierarchy https://github.com/ispc/ispc/blob/main/src/builtins.cpp#L472. If you want to have target-specific implementation, do it in corresponding builtin file.

Could you also add results for AVX512ICL-x16, AVX512ICL-x32, AVX512ICL-x64?

Great! So looks like we can safely update the implementation for avx512 targets only. We should keep old implementation for sse-avx1 targets and avx2 may require specific optimization as was...

I'm currently test the changes locally. If the PR is ready for review, please mark it so.

I didn't see it at first but VPOPCNTDQ is only available starting ICL. Please move implementation of popcnt_*_varying from builtins/target-avx512-utils.ll to each `avx512icl*` file.

Looks like OpenMoonRay failure is a real failure. It passes in all other PRs.