simde Fixed _mm_insert_ps in correct translation of the function.

Fixed _mm_insert_ps in correct translation of the function.

Open MirJawadMairaj opened this issue 2 years ago • 6 comments

The issue was a wrong copy and incorrect masking. https://github.com/simd-everywhere/simde/issues/931

I haven't fixed the unit test as I don't have the dev setup of the library (I am working on windows system). Haven't used a Debian set up before.

If that is a requirement than please post a link how to set that up and run.

I think the unit doesn't test the scenarios doesn't test the 3rd parameter (flag) with more values. https://github.com/simd-everywhere/simde/blob/master/test/x86/sse4.1.c#L1977

Which is why it passes. And probably this will pass as well. But a better unit test is needed.

Nov 04 '21 10:11 MirJawadMairaj

Thanks @MirJawadMairaj

For compiling using Debian on MS Windows, try the first 4 instructions at https://github.com/common-workflow-language/cwltool/#ms-windows-users and then follow https://github.com/simd-everywhere/simde/wiki/Development-Environment#x86--x86_64 for running the tests

See https://github.com/simd-everywhere/simde/wiki/Implementing-a-New-Function for hints on writing new test cases

Nov 04 '21 11:11 mr-c

Hi, I removed the trailing spaces. But it still fails on the same lines.

Nov 05 '21 09:11 MirJawadMairaj

But it still fails on the same lines.

? The formatting check passes now https://github.com/simd-everywhere/simde/runs/4105015189?check_suite_focus=true

Nov 05 '21 09:11 mr-c

Cool! Then I interpreted the failure email wrong. It was older and maybe delayed a bit. I should have seen the hash it complained on.

Nov 05 '21 09:11 MirJawadMairaj

@MirJawadMairaj Can you suggest some values that now produce the correct result buy previously did not?

Nov 05 '21 11:11 mr-c

Sure!

So I had a bug around not rendering a lot of meshes because of a wrong transformation. So the confirmation for me was to see them being rendered after the fix. But I gathered two such runs through visual studio (and these are running the actual intel simd instruction as opposed to ARM translations which I am going to assume are fairly close). Of course I did some tests on a smaller scale with toy values.

code snippet:

v0 = _mm_insert_ps(v0, vec, 0b00110000);	// insert trans.x into bits 0-31 of row0
v1 = _mm_insert_ps(v1, vec, 0b01110000);	// insert trans.y into bits 32-63 of row0
v2 = _mm_insert_ps(v2, vec, 0b10110000);	// insert trans.z into bits 64-95 of row0

Run 1)

vec = {3360.90796, 1192.48792, 1759.03296, -4.37113883e-08} v0_before = {-4.37113883e-08, 0.00000000, 1.00000000, 0.00000000} float[4] v0_after = {-4.37113883e-08, 0.00000000, 1.00000000, 3360.90796} float[4]

vec = {3360.90796, 1192.48792, 1759.03296, -4.37113883e-08} v1_before = {0.00000000, 1.00000000, 0.00000000, -1.00000000} float[4] v1_after = {0.00000000, 1.00000000, 0.00000000, 1192.48792} float[4]

vec = {3360.90796, 1192.48792, 1759.03296, -4.37113883e-08} v2_before = {-1.00000000, 0.00000000, -4.37113883e-08, 3360.90796} float[4] v2_after = {-1.00000000, 0.00000000, -4.37113883e-08, 1759.03296}

Run 2)

vec = {3328.00000, 1259.00000, 1781.00000, 1.00000000} float[4] v0_before = {1.00000000, 0.00000000, 0.00000000, 0.00000000} float[4] v0_after = {1.00000000, 0.00000000, 0.00000000, 3328.00000} float[4]

vec = {3328.00000, 1259.00000, 1781.00000, 1.00000000} float[4] v1_before = {0.00000000, 1.00000000, 0.00000000, 0.00000000} float[4] v1_after = {0.00000000, 1.00000000, 0.00000000, 1259.00000} float[4]

vec = {3328.00000, 1259.00000, 1781.00000, 1.00000000} float[4] v2_before = {0.00000000, 0.00000000, 1.00000000, 0.00000000} float[4] v2_after = {0.00000000, 0.00000000, 1.00000000, 1781.00000} float[4]

Nov 05 '21 12:11 MirJawadMairaj

This was merged as https://github.com/simd-everywhere/simde/commit/94e75690e62335a822962520cbbd0208991e63a5 ; thanks!

Jan 18 '23 15:01 mr-c

simde simde copied to clipboard

Fixed _mm_insert_ps in correct translation of the function.

simde
simde copied to clipboard