Added NativePerJobThreadIntPtrs
I generalized NativePerJobThreadIntPtr to take advantage of the wasted cache line space and allow for multiple counters to run at the same time. The buffer will round up to the next cache line size multiple.
Uses array access operators instead of the single Value and Parallel requires an index, but other than that the code is mostly identical.
One more thing: there should also be a NativePerJobThreadLongPtrs to keep the parity established by NativePerJobThreadLongPtr and NativeLongPtr.
I've implemented all the requested changes except for the removal of the arbitrary length. On line 236 you can see that NumCacheLines = (int)Math.Ceiling((double)length / IntsPerCacheLine); makes it so there is potentially more than 1 cache line. I don't see any issue as long as the blocks are in multiples of a cache line. Do you see any performance or race condition issues with this?
Once we are good with this file I'll add in the equivalent long version, tests and readme update.
Hey @hexonaut , thanks for making all these changes! Allow me to elaborate on my feedback that Length should always be constant. The original NativePerJobIntPtr that this is based on is a pointer to exactly one int. In the description for this version, the idea is to avoid wasting cache line space by supporting more than one int. This makes sense to me since a cache line can indeed fit more than one int and it'd be nice to allow users to access the rest of the int values that are stored in that cache line. This would imply that Length is a constant (IntsPerCacheLine) since the number of int values that fit into a cache line never changes.
That said, the presence of a length parameter to the constructor and respecting that value by potentially allocating an arbitrarily large amount of space for any number of int values changes the nature of the type. No longer is it just exposing the rest of the cache line as a constant-length array, but instead it has become a type like NativeArray<int> that supports arbitrary length arrays. While this is a valid design, it's for a very different purpose than stated in the description. It might be better called NativePerJobThreadIntArrays.
So I'd prefer to keep this type to just the constant-length array of int values that fit in a cache line. Making this change will really simplify this type and focus it on just reducing cache line waste. If there's a desire for per-thread int arrays, we should put that in another type.
I see what you mean now. I can split the code into NativePerJobThreadIntArrays and remove length from the NativePerJobThreadIntPtrs constructor. Would you prefer a separate PR for NativePerJobThreadIntArrays?
Yes, a separate PR is probably a good idea to keep the branch size manageable.
One more thing: I was thinking about the name NativePerJobThreadIntPtrs and realized that it implies "many pointers to one int" or possibly "many pointers to many ints", which is not accurate because there aren't many pointers and there aren't necessarily many ints. What do you think about renaming it to NativePerJobThreadIntsPtr, which indicates "one pointer to many ints"?
Sounds good to me. I'll make the name switch.