runtime
                                
                                 runtime copied to clipboard
                                
                                    runtime copied to clipboard
                            
                            
                            
                        [API Proposal]: Interlocked.CompareExchange for byte/short/sbyte/ushort
Background and motivation
We have a few flags fields in the compiler that can be set by multiple threads, used as part of our symbol models. As we avoid locks whenever possible, we generally use a loop on Interlocked.CompareExchange, under the assumption it will succeed eventually or the value will have been computed and set by some other thread. However, because CompareExchange only supports int, long, uint, and ulong  integer values, these fields must be at least 1 int wide. This doesn't sound like much, but as the compiler needs to load many, many symbols, those few extra bits can add up. I'd like to be able to use bytes where possible for these values. I only really want byte/ushort overloads, but added the signed variants for completeness.
API Proposal
namespace System.Threading
{
    public static class Interlocked
    {
        public static byte CompareExchange(ref byte location1, byte value, byte comparand);
        public static sbyte CompareExchange(ref sbyte location1, sbyte value, sbyte comparand);
        public static short CompareExchange(ref short location1, short value, short comparand);
        public static ushort CompareExchange(ref ushort location1, ushort value, ushort comparand);
    }
}
API Usage
        public static bool Set(ref byte flags, byte toSet)
        {
            byte oldState, newState;
            do
            {
                oldState = flags;
                newState = oldState | toSet;
                if (newState == oldState)
                {
                    return false;
                }
            }
            while (Interlocked.CompareExchange(byte flags, newState, oldState) != oldState);
            return true;
        }
Alternative Designs
No response
Risks
No response
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.
Issue Details
Background and motivation
We have a few flags fields in the compiler that can be set by multiple threads, used as part of our symbol models. As we avoid locks whenever possible, we generally use a loop on Interlocked.CompareExchange, under the assumption it will succeed eventually or the value will have been computed and set by some other thread. However, because CompareExchange only supports int, long, uint, and ulong  integer values, these fields must be at least 1 int wide. This doesn't sound like much, but as the compiler needs to load many, many symbols, those few extra bits can add up. I'd like to be able to use bytes where possible for these values. I only really want byte/ushort overloads, but added the signed variants for completeness.
API Proposal
namespace System.Threading
{
    public class Interlocked
    {
        public byte CompareExchange(ref byte location, byte value, byte comparand);
        public sbyte CompareExchange(ref sbyte location, sbyte value, sbyte comparand);
        public short CompareExchange(ref short location, short value, short comparand);
        public ushort CompareExchange(ref ushort location, ushort value, ushort comparand);
    }
}
API Usage
        public static bool Set(ref byte flags, byte toSet)
        {
            byte oldState, newState;
            do
            {
                oldState = flags;
                newState = oldState | toSet;
                if (newState == oldState)
                {
                    return false;
                }
            }
            while (Interlocked.CompareExchange(byte flags, newState, oldState) != oldState);
            return true;
        }
Alternative Designs
No response
Risks
No response
| Author: | 333fred | 
|---|---|
| Assignees: | - | 
| Labels: | 
 | 
| Milestone: | - | 
Also CC. @stephentoub
This is something that both x86/x64 (cmpxchg) and Arm64 (cas, casb, and cash) support so I don't see anything that would block this.
- I'm still of the general opinion that we should go ahead and just expose the full set of missing Interlockedfunctionality here to help ensure we have a good and usable baseline for users to write their multi-threaded algorithms. x86/x64 and Arm64 support these forAdd,And,BitTestAndComplement,BitTestAndReset,BitTestAndSet,CompareExchange,Decrement,Increment,Negate,Not,Or,Subtract, andXOr. There are also some architecture specific cases that are possible good forHardware Intrinsicoptions. Other major languages/runtimes (including Java, not just C/C++, Rust, etc) have a much more complete set of baseline APIs for atomic and interlocked operations than we do and I believe we are "behind" comparatively speaking.
I have no problem with us adding CompareExchange overloads for byte/short/etc. if every architecture we target and envision targeting supports them.
Same for overloads of other existing APIs we have, e.g. Increment, Decrement, etc.
I'm still of the general opinion that we should go ahead and just expose the full set of missing Interlocked functionality here to help ensure we have a good and usable baseline for users to write their multi-threaded algorithms. x86/x64 and Arm64 support these for Add, And, BitTestAndComplement, BitTestAndReset, BitTestAndSet, CompareExchange, Decrement, Increment, Negate, Not, Or, Subtract, and XOr. There are also some architecture specific cases that are possible good for Hardware Intrinsic options. Other major languages/runtimes (including Java, not just C/C++, Rust, etc) have a much more complete set of baseline APIs for atomic and interlocked operations than we do and I believe we are "behind" comparatively speaking.
We discussed this in: https://github.com/dotnet/runtime/issues/24694#issuecomment-586389981 I've not seen or heard any further discussion that would change my opinion (namely that we shouldn't be adding APIs for which there isn't a demonstrated need). If you'd like to make a case for it, please open a separate issue.
A slight digression from the topic of the issue, but related to it.
I have not found related issues about the below problem.
Apart from the overloads described in this issue, it would be a good idea to add overloads for the Enum type. As far as I know, it is based on integers and it's possible.
@333fred Can you add a nuint overload to your proposal?
A nint overload is already covered by the existing IntPtr overload). But it might be useful to add CompareExchangeNInt (@tannergooding what do you think?)
Can you add a nuint overload to your proposal?
That's https://github.com/dotnet/runtime/issues/45103.
But it might be useful to add CompareExchangeNInt
I don't think we should do that.
@stephentoub is there anything blocking this suggestion? Can we tag it for review?
That's up to @kouvel.
I don't see any issues with this either, marked as ready-for-review
- Looks good, but we should probably also do Exchange
namespace System.Threading
{
    public static class Interlocked
    {
        public static byte CompareExchange(ref byte location1, byte value, byte comparand);
        public static sbyte CompareExchange(ref sbyte location1, sbyte value, sbyte comparand);
        public static short CompareExchange(ref short location1, short value, short comparand);
        public static ushort CompareExchange(ref ushort location1, ushort value, ushort comparand);
        public static byte Exchange(ref byte location1, byte value);
        public static sbyte Exchange(ref sbyte location1, sbyte value);
        public static short Exchange(ref short location1, short value);
        public static ushort Exchange(ref ushort location1, ushort value);
    }
}
Does this API require JIT-changes?
There are multiple possible implementation strategies. It can either be implemented in the runtimes in C/C++ and/or it can be must-expand intrinsic in the JIT.
Even if an architecture doesn't support these operations, couldn't they be implemented as CompareExchange loops?
public static short CompareExchange(ref short location, short value, short comparand)
{
    ref int locationAsInt = ref Unsafe.As<short, int>(ref location);
    int newValueAsInt = 0;
    ref short newValueLocation = ref Unsafe.As<int, short>(ref newValueAsInt);
    int initialValueAsInt;
    short initialValue;
    do
    {
        initialValueAsInt = Volatile.Read(ref locationAsInt);
        initialValue = Unsafe.As<int, short>(ref initialValueAsInt);
        if (initialValue != comparand)
        {
            return initialValue;
        }
        newValueAsInt = initialValueAsInt;
        Volatile.Write(ref newValueLocation, value);
    } while (Interlocked.CompareExchange(ref locationAsInt, newValueAsInt, initialValueAsInt) != initialValueAsInt);
    return initialValue;
}
Of course I realize this could be quite slow if the short's location isn't on a 4-byte address. The int location could be shifted to the floor of the nearby 4-byte address (using pointers instead of Unsafe.As), unless the short itself crosses a 4-byte boundary.
[Edit] Although, I would find it strange for a new architecture to support working with small values and not also support working with them in an Interlocked fashion.
@timcassell
newValueAsInt = initialValue; should be perhaps newValueAsInt = initialValueAsInt;? And maybe if (initialValue == comparand) instead of if (initialValue != comparand).
@timcassell
newValueAsInt = initialValue;should be perhapsnewValueAsInt = initialValueAsInt;?
Ah yes, good catch.
And maybe
if (initialValue == comparand)instead ofif (initialValue != comparand).
No, that would change the behavior to compare exchange if not equal instead of compare exchange if equal. Also, that check would be removed to implement Exchange.
@timcassell
No, that would change the behavior to compare exchange if not equal instead of compare exchange if equal.
Indeed, you are right. Sorry for the false alarm!
In your code, if the byte/short/etc. is just before memory that's not accessible, such that extending by a few more bytes steps over that boundary, this could AV / seg fault.
I realize this could be quite slow
Which is why we don't want to implement these without the right hardware support. Otherwise someone "optimizing" their code by updating it from, say, using an int field to instead using a byte field is actually deoptimizing it.
In your code, if the byte/short/etc. is just before memory that's not accessible, such that extending by a few more bytes steps over that boundary, this could AV / seg fault.
Yes, I realized that, and shifting the pointer like I mentioned would solve that as well. But that's not the point. That was just a proof-of-concept to show that it could be implemented without hardware support, if necessary.
Which is why we don't want to implement these without the right hardware support. Otherwise someone "optimizing" their code by updating it from, say, using an int field to instead using a byte field is actually deoptimizing it.
It seems that all architectures that .Net currently supports have hardware support, so this is a non-issue. And I doubt future architecture wouldn't support it. But, hypothetically, if .Net does support a future architecture that doesn't have hardware support, would you prefer to throw a NotSupportedException, or have a slow implementation?
It seems that all architectures that .Net currently supports have hardware support, so this is a non-issue.
My point was we're not going to add a temporary implementation based on such a loop.
Could bool and char overloads be also considered? They'd be trivial to implement with this.
Could
byteandcharoverloads be also considered? They'd be trivial to implement with this.
Perhaps you meant bool? byte is already part of the proposal.
Could
byteandcharoverloads be also considered? They'd be trivial to implement with this.Perhaps you meant
bool?byteis already part of the proposal.
Right, my bad.
Could bool and char overloads be also considered? They'd be trivial to implement with this. In my opinion, a good idea.
What do you think? @terrajobst @stephentoub
What do you think?
It's trivial for someone to implement those themselves in terms of the proposed ones. That said, bool could be a reasonable addition, just because it's so common. The problem is it's a slippery slope with all the different methods and all the different possible types; line needs to be drawn somewhere.
Why are you asking @deeprobin?
@stephentoub Oh I thought something like that always has to be strictly re-reviewed. I ask because I plan to implement this API if no one objects.
Oh I thought something like that always has to be strictly re-reviewed.
It does if it were to be included.
I ask because I plan to implement this API if no one objects.
I'm concerned this may be too complex. What is your plan?
I'm concerned this may be too complex. What is your plan?
You are right. I took a look at the whole thing. I would probably have to include instructions like CMPXCHG8B & CMPXCHG16B, which I would still consider feasible, but the ARM implementation would worry me because I haven't dealt with it yet.
Then the issue is better for someone else :D
I could really use an Interlocked.CompareExchange bool variant.  Can that get added too?
I could really use an Interlocked.CompareExchange
boolvariant. Can that get added too?
The only issue with that is that a bool with internal representation of 2 would be considered different from one represented as 1.