simde
simde copied to clipboard
Fixed _mm_insert_ps in correct translation of the function.
The issue was a wrong copy and incorrect masking. https://github.com/simd-everywhere/simde/issues/931
I haven't fixed the unit test as I don't have the dev setup of the library (I am working on windows system). Haven't used a Debian set up before.
If that is a requirement than please post a link how to set that up and run.
I think the unit doesn't test the scenarios doesn't test the 3rd parameter (flag) with more values. https://github.com/simd-everywhere/simde/blob/master/test/x86/sse4.1.c#L1977
Which is why it passes. And probably this will pass as well. But a better unit test is needed.
Thanks @MirJawadMairaj
For compiling using Debian on MS Windows, try the first 4 instructions at https://github.com/common-workflow-language/cwltool/#ms-windows-users and then follow https://github.com/simd-everywhere/simde/wiki/Development-Environment#x86--x86_64 for running the tests
See https://github.com/simd-everywhere/simde/wiki/Implementing-a-New-Function for hints on writing new test cases
Hi, I removed the trailing spaces. But it still fails on the same lines.
But it still fails on the same lines.
? The formatting check passes now https://github.com/simd-everywhere/simde/runs/4105015189?check_suite_focus=true
Cool! Then I interpreted the failure email wrong. It was older and maybe delayed a bit. I should have seen the hash it complained on.
@MirJawadMairaj Can you suggest some values that now produce the correct result buy previously did not?
Sure!
So I had a bug around not rendering a lot of meshes because of a wrong transformation. So the confirmation for me was to see them being rendered after the fix. But I gathered two such runs through visual studio (and these are running the actual intel simd instruction as opposed to ARM translations which I am going to assume are fairly close). Of course I did some tests on a smaller scale with toy values.
code snippet:
v0 = _mm_insert_ps(v0, vec, 0b00110000); // insert trans.x into bits 0-31 of row0
v1 = _mm_insert_ps(v1, vec, 0b01110000); // insert trans.y into bits 32-63 of row0
v2 = _mm_insert_ps(v2, vec, 0b10110000); // insert trans.z into bits 64-95 of row0
Run 1)
vec = {3360.90796, 1192.48792, 1759.03296, -4.37113883e-08} v0_before = {-4.37113883e-08, 0.00000000, 1.00000000, 0.00000000} float[4] v0_after = {-4.37113883e-08, 0.00000000, 1.00000000, 3360.90796} float[4]
vec = {3360.90796, 1192.48792, 1759.03296, -4.37113883e-08} v1_before = {0.00000000, 1.00000000, 0.00000000, -1.00000000} float[4] v1_after = {0.00000000, 1.00000000, 0.00000000, 1192.48792} float[4]
vec = {3360.90796, 1192.48792, 1759.03296, -4.37113883e-08} v2_before = {-1.00000000, 0.00000000, -4.37113883e-08, 3360.90796} float[4] v2_after = {-1.00000000, 0.00000000, -4.37113883e-08, 1759.03296}
Run 2)
vec = {3328.00000, 1259.00000, 1781.00000, 1.00000000} float[4] v0_before = {1.00000000, 0.00000000, 0.00000000, 0.00000000} float[4] v0_after = {1.00000000, 0.00000000, 0.00000000, 3328.00000} float[4]
vec = {3328.00000, 1259.00000, 1781.00000, 1.00000000} float[4] v1_before = {0.00000000, 1.00000000, 0.00000000, 0.00000000} float[4] v1_after = {0.00000000, 1.00000000, 0.00000000, 1259.00000} float[4]
vec = {3328.00000, 1259.00000, 1781.00000, 1.00000000} float[4] v2_before = {0.00000000, 0.00000000, 1.00000000, 0.00000000} float[4] v2_after = {0.00000000, 0.00000000, 1.00000000, 1781.00000} float[4]
This was merged as https://github.com/simd-everywhere/simde/commit/94e75690e62335a822962520cbbd0208991e63a5 ; thanks!