Alex Wells comments

Results 21 comments of


                                            Alex Wells

Make vectors trivially relocatable.

Historically with vectorizing compilers, a data type can have its members turned into scalars with a "scalar replacement of aggregates" optimization pass. Once a data type has become scalars, its...

Draft: Remove more unnecessary conditional run layer calls

@chellmuth , as an experiment could you add a no-inline attribute to the layer functions to prevent them from being inlined? Not sure you really want to do that, but...

Share Shading Context when optimizing/jitting a shader…

Hold up on this, going to update so that such that the work to "borrow" a shading context for opt/jit is internal to ShadingContext and not burden on its user...

Share Shading Context when optimizing/jitting a shader…

Ok Larry, this is ready for review

Add support for b4_SSE2 batched mode.

@johnfea thanks, I think the question (possibly unanswered) was: With many of the noise and other functions doing internal SIMD using x,y,z or r,g,b,a to take advantage of SSE would...

Add support for b4_SSE2 batched mode.

@johnfea great to hear it is speeding things up. Curious how Batched does with AVX on the same workloads vs Batched with SSE. As far as backfacing() not working, it...

Add support for b4_SSE2 batched mode.

For the Files Changed it all looks good, just adds a 4 wide path utilizing all the same approaches. Want to see the CI run it though and make sure...

Add support for b4_SSE2 batched mode.

@johnfea , can you elaborate or provide example of "old and new non-typical width batched code in llvm_util.c isn't covered by testsuite though."

Add support for b4_SSE2 batched mode.

Looking at CI action, I see [VFX2021 gcc9/C++17 llvm11 py3.7 exr2.5 oiio2.3 sse2 batch-b4sse2](https://github.com/AcademySoftwareFoundation/OpenShadingLanguage/actions/runs/10418827805/job/28866184078?pr=1825#logs) which successfully executed in (SSE2 batch width 4) 60 different *.regress.batched.opt tests comparing results against scalar...

Add support for b4_SSE2 batched mode.

@johnfea , I got it, so CI doesn't execute all combinations of 4,8,16 and SSE, AVX, AVX2, AVX512 ISA's so portions of llvm_util.cpp maybe untested. 1. I do think llvm_util.cpp...