MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Thread unsafety (broken uses of std::unique_lock)

Open higher-performance opened this issue 3 years ago • 10 comments

Hi, there appear to be several mistaken uses of std::unique_lock in the code, such as:

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/0b291216adc647d8b71e6156c60abc78790d876e/test/perfdb.cpp#L894

These appear to mistakenly release the lock immediately after acquisition; they don't hold the lock for the duration of the block. If they could be fixed, that would be wonderful. Thanks!

higher-performance avatar Dec 09 '22 05:12 higher-performance

@JehandadKhan Can you please look into this?

atamazov avatar Dec 09 '22 19:12 atamazov

@higher-performance Apologies for the lack of response. Can you please test with latest ROCm 6.1.0? If resolved, please close ticket. Thanks!

ppanchad-amd avatar Apr 17 '24 15:04 ppanchad-amd

@ppanchad-amd: I'm confused, I still see these in the sources. What do you believe has changed?

https://github.com/ROCm/MIOpen/blob/261a49738aba61ce50a77e717895de9619de5f75/test/perfdb.cpp#L942

higher-performance avatar Apr 17 '24 16:04 higher-performance

@higher-performance That was just a routine question, no beliefs))

@JehandadKhan I suspect this is of ~https://github.com/ROCm/MIOpen/labels/urgency_high https://github.com/ROCm/MIOpen/labels/testing~ https://github.com/ROCm/MIOpen/labels/quality but hope this can be easily fixed. Please let me know if you need more help from me.

/cc @cderb @junliume

atamazov avatar Apr 17 '24 21:04 atamazov

@JehandadKhan @cderb No need to look here, I will do it myself or ask @DrizztDoUrden

atamazov avatar Apr 17 '24 21:04 atamazov

I think this was meant to synchronize the threads to start at the same time. The test would be completely meaningless if locks would be held for entire block there.

Is there a specific failure you have encountered?

DrizztDoUrden avatar Apr 18 '24 15:04 DrizztDoUrden

Ah I see. It's been a while, so at this point I don't recall how I came across it; it's possible I just saw the code and felt it looked buggy. If this is just a timing thing, then I guess it's fine, feel free to close this. Though note that this actually forces the threads to serialize, since they unlock the mutexes one-by-one, and thus they will never start at the same time.

higher-performance avatar Apr 18 '24 15:04 higher-performance

@higher-performance

I just saw the code and felt it looked buggy.

Sometimes it's a good reason for opening a ticket;) From the code quality POV at least.

@DrizztDoUrden Maybe it is worth creating a NFC PR that adds relevant comment to the source code, what do you think?

atamazov avatar Apr 18 '24 22:04 atamazov

In that case mutex can also be changed to shared so that threads would actually start closer to each other even on tiny workloads

DrizztDoUrden avatar Apr 19 '24 17:04 DrizztDoUrden

@DrizztDoUrden Good idea. Please proceed or delegate to someone else.

atamazov avatar Apr 19 '24 17:04 atamazov

@JehandadKhan @higher-performance @DrizztDoUrden This may be closed now.

atamazov avatar May 15 '24 14:05 atamazov