Evan Nemerson
Evan Nemerson
I was thinking about putting out a 0.8.0 after GSoC is done; development will probably slow down significantly then. Eventually I'm thinking it might be a good idea to move...
rdtsc may be a better fit for [the builtin module in portable-snippets](https://github.com/nemequ/portable-snippets/tree/master/builtin). It's mostly focused on generic built-ins (i.e., `__builtin_bswap32` not `_byteswap_ulong`), but there are a few x86-style instrinsics as...
Thanks for the reminder! I added some more earlier today, and we'll try to get that last one done soon; I think @Glitch18 is planning to take care of it.
BTW, once this is done I'd be very interested in any performance data which could point us to something we might be able to optimize in SIMDe. See https://github.com/simd-everywhere/simde/wiki/Performance-Tuning#finding-performance-problems
> This failures also do not happen with the gcc 11 + only -O2 flags (CFLAGS="-O2" and no CXXFLAGS="-O2"). They don't? i686 is a mess on my system, even without...
This is *awesome*, thank you! I'm not familiar with the architecture, but I'd like to support it as best we can. I'm willing to merge more or less as-is, but...
> Thanks! It's nice to see that project maintainer is interested in such a PR. > > Unfortunately, there are some problems that may cause trouble for CI. First, there...
I'm merging some of this as 349da2b621f275e5ebc83fa6590235240821779a, 093b2c578cba4a8591de6a611818b3fa48d07430, 24ddeba55cf3bbfb014e79ea961ec201be1223ff. I'll publish a [`wip/e2k`](https://github.com/simd-everywhere/simde/tree/wip/e2k) branch in the SIMDe repository with your changes rebased.
I've been playing around a bit with this, and I have a pretty small test case for the reduced-alignment issue: ```c #include #include typedef union { int8_t i8 __attribute__((__vector_size__(32))); }...
0366dab69680125218a5e604e8e8d74ed346b0ff, e38fe50f5b1ede9f4a247196d414b899d4ba3a9f, and ad8c7e0723fb92d73324e5dd799ccdd41051251a move this along pretty well. With those patches in place I'm able to get to the point where the compilation fails due to the inefficient implementations....