perf-ninja icon indicating copy to clipboard operation
perf-ninja copied to clipboard

Develop Mac versions of solutions for a few labs

Open dendibakh opened this issue 2 years ago • 9 comments

Currently the following labs don't have solutions for Mac M1 platform:

  ["memory_bound"]["huge_pages_1"]          - need to check huge pages on Mac
  ["misc"]["io_opt1"]                       - mmap on Mac
  ["core_bound"]["compiler_intrinsics_1"]   - NEON version
  ["core_bound"]["compiler_intrinsics_2"]   - NEON version

This prevents automated benchmarking of their speedups in CI.

dendibakh avatar Sep 30 '22 00:09 dendibakh

You can try to use "sse2neon"

andrewevstyukhin avatar Sep 30 '22 06:09 andrewevstyukhin

You can try to use "sse2neon"

Thanks! Good idea.

dendibakh avatar Sep 30 '22 13:09 dendibakh

Hi @dendibakh, I can confirm that using sse2neon to solve the ["core_bound"]["compiler_intrinsics_1"] does work, albeit it is a bit slower than writing pure ARM Neon code, due to the differences in the architectures and the more instructions need to translate from SSE to NEON using the same x86 algorithms.

You can check the CI Job as well as my branch with commits .

Once again, thank you for the excellent work!

Cosmin-B avatar Dec 11 '22 21:12 Cosmin-B

Ups, I did not mean to close this. Denis, can you please re-open this?

Cosmin-B avatar Dec 11 '22 21:12 Cosmin-B

Hi @dendibakh, I can confirm that using sse2neon to solve the ["core_bound"]["compiler_intrinsics_1"] does work,

This is nice to know! I haven't used sse2neon before.

albeit it is a bit slower than writing pure ARM Neon code, due to the differences in the architectures and the more instructions need to translate from SSE to NEON using the same x86 algorithms.

I can't find your NEON implementation. Can you please share it?

Once again, thank you for the excellent work!

You're welcome! :)

dendibakh avatar Dec 13 '22 15:12 dendibakh

Hey @dendibakh

This is it! implementation. You can look at the history as well as see that I added the sse2neon.h in the compiler_intrinsics_1 folder. You can add that as a dependency that you would pull automatically on ARM devices compatible with Neon.

Additionally, here is the link to the CI job for M1 Mac, I had to enable the CI to run on M1 for this lab, I did that here.

P.S.: I am sorry for temporarily closing the issue again. The GitHub interface is not friendly enough for me.

Kind regards, Cosmin

Cosmin-B avatar Dec 13 '22 15:12 Cosmin-B

I thought you said you wrote NEON instrinsics yourself without using sse2neon library, no?

dendibakh avatar Dec 13 '22 15:12 dendibakh

I can confirm that using sse2neon to solve the ["core_bound"]["compiler_intrinsics_1"] does work, albeit it is a bit slower than writing pure ARM Neon code

Hey, @dendibakh, I am sorry for the misunderstanding. However, I did not say that. I only said that I confirmed that it could be done using the sse2neon library.

Kind regards, Cosmin

Cosmin-B avatar Dec 14 '22 18:12 Cosmin-B

I can confirm that using sse2neon to solve the ["core_bound"]["compiler_intrinsics_1"] does work, albeit it is a bit slower than writing pure ARM Neon code

Hey, @dendibakh, I am sorry for the misunderstanding. However, I did not say that. I only said that I confirmed that it could be done using the sse2neon library.

Kind regards, Cosmin

Ok, got it, no worries. Thanks for sharing your experiments.

dendibakh avatar Dec 15 '22 14:12 dendibakh