mame icon indicating copy to clipboard operation
mame copied to clipboard

src/devices/sound/discrete.cpp bad address and threading issues

Open bswieser opened this issue 2 years ago • 5 comments

MAME version

0.255 mame0255-62-g9ff8931fa-50-dirty

System information

odroid n2l 6 core arm ubuntu 22.04 kern 5.15 en with osd sdl alsa via hdmi audio out

INI configuration details

all defaults except
audiodriver alsa
samplerate 48000
samples 1
waitvsync 1
syncrefresh 1

Emulated system/software

galaxian,mooncrst

Incorrect behaviour

Few moments into emulation, negative samples exception is thrown. Turning on debug, isnan exceptions also thrown. I suspect this is a threading/lock issue and use of invalid pointers (I am backed up by two fixme warnings re ptr math done before input list is actually initialized). I am trying to fix it, but I am not an expert (yet). (For example, I've forced number of samples to always have correct pointers, but it has side effects.)

Expected behaviour

No pointer math should result in invalid (out of bounds) access resulting in a segfault or underflow. The glitch is "audible" on x86 linux as a chirp on emu launch. Unless someone beats me to it, I will fix this.

Steps to reproduce

  • launch mame (debug is good) with mooncrst on arm arch64 kern 5.15 smp preemp
  • hear the first chirp after rom check (chirp is due to uninit data - infrequently isnan exception thrown)
  • wait (usually within 5 minutes) and samples negative is thrown

Additional details

No response

bswieser avatar Jun 24 '23 22:06 bswieser

So far, I am having fun replicating the issue. I've rebuilt a tiny version without optimization and full symbolics. This build does not crash - at least it ran 3 days without crashing. Its hard to debug when something doesn't crash. I'm going to test by rebuilding with optimization disabled (via pragma) to see if it makes a difference. I've had the crash with O2 or size. Haven't tried O1...

bswieser avatar Jul 07 '23 18:07 bswieser

I found the bug. The pointers to the buffers were going invalid thanks to optimization. I fixed the bug. Anything const dynamic should be validated before use. This only appears to be an issue on arm. I am trying my fix on x86 now (to verify nothing breaks).

bswieser avatar Dec 10 '23 21:12 bswieser

Issues needs to be closed after they are fixed in upstream, there's also an option that does it automatically once a PR gets merged under "Development" right pane.

angelosa avatar Dec 10 '23 21:12 angelosa

This is not an osd sdl issue. The code is not thread safe "enough". https://github.com/mamedev/mame/blame/0636e54bb60192652efeeeb7732103b9134a31eb/src/devices/sound/discrete.cpp#L246

Sending a patch.

bswieser avatar Feb 12 '24 16:02 bswieser

Patch sent, explanation of why I treat process as "critical section" https://www.octavianit.com/2024/02/fixing-galaxian-sound-on-arm-8-core.html

bswieser avatar Feb 12 '24 18:02 bswieser

Patch sent, explanation of why I treat process as "critical section" https://www.octavianit.com/2024/02/fixing-galaxian-sound-on-arm-8-core.html

Consider the code where discrete_task::process is called from discrete_task::callback:

    /* try to lock */
    if (task->lock_threadid(threadid))
    {
        if (!task->process())
            return nullptr;
        task->unlock();
    }

It locks the task to the current thread before it calls process (i.e. discrete_task::process is already treated as a critical section). If that’s working, process can only run on a single thread at a time for any given task. If that isn’t working, there’s an issue with discrete_task::lock_threadid that needs to be fixed properly.

Since it’s already guarding against running discrete_task::process on multiple threads concurrently for the same task, the mutex is redundant and just adds overhead.

cuavas avatar Feb 14 '24 02:02 cuavas

Not sure why you reopened it; acquire instead of release appears to have fixed the issue. But thanks for your insight.

bswieser avatar Feb 16 '24 03:02 bswieser

@seleuco tested it after #12034 was merged and can no longer reproduce the issue.

cuavas avatar Feb 16 '24 03:02 cuavas

Not sure why you reopened it; acquire instead of release appears to have fixed the issue. But thanks for your insight.

It was automatically closed by GitHub because the issue was mentioned in the pull request description. There's no way to stop it from doing this.

I reopened it until and more testing was completed (by @balr0g and @seleuco).

cuavas avatar Feb 16 '24 03:02 cuavas