Jan Wassenberg

Results 405 comments of Jan Wassenberg

Thanks for sharing the result. I was unable to reproduce it with GCC 10.3 (godbolt lacks 10.4) and `-O2 -march=armv7-a -mfpu=vfpv3-d16`, and your `-O2 -mfloat-abi=hard -mfpu=vfpv3-d16 -marm -mlibarch=armv7-a+fp -march=armv7-a+fp`. https://gcc.godbolt.org/z/KrYz818xY

You can see them in the dropdown menu in the link above, where it currently says "ARM GCC 10.3.1" :) The next higher one is 11.1.

:) The question is not whether we can get it to fail with other compilers. Instead the problem appears to be the configuration of the compiler, because it works (see...

Hi @michaeljclark, great to hear you're interested in looking into this. It would indeed be nice to extend those to 16-bit lanes and I'm happy to help. Have you seen...

Good point about U8FromU32 being a piecemeal approach. I agree this is not the best path and think that TruncateTo is a great idea. We can then deprecate U8FromU32, and...

@michaeljclark sorry to reply super late, I missed this in my inbox. Your patch looks like a great start! Two minor comments: there is already a CombineShiftRightLanes, so we'll want...

Thanks @funrollloops for sharing the result, that's a nice efficiency boost. We could certainly expose the partitioner. It should be in Sorter so it also has access to the buffer....

Indeed sounds like this system is memory bandwidth bound and that more threads are just stalled. Is that really the system your sort has to run on? 30 GB/s is...