desktop icon indicating copy to clipboard operation
desktop copied to clipboard

GPU process crash on Linux with nvidia

Open dllu opened this issue 3 years ago • 3 comments

I confirm (by marking "x" in the [ ] below: [x]):


Summary

Environment

  • Operating System: Ubuntu 20.04.3 LTS
  • Mattermost Desktop App version: 5.0.2
  • Mattermost Server version: 6.1.0

Other relevant environment info:

  • GPU: NVIDIA Quadro M4000
  • nvidia driver version: Driver Version: 470.82.00

Steps to reproduce

  • Launch mattermost-desktop from command line
  • Use mattermost as usual
  • Rapidly toggle certain slow-loading menu items such as the emoji selector, custom emoji page, or something. I'm not exactly sure.

Expected behavior

Mattermost does not freeze or crash

Observed behavior

In rare situations, mattermost freezes/crashes with

[437644:1210/114605.189733:ERROR:validation_errors.cc(106)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[437644:1210/114605.189934:ERROR:gpu_child_thread.cc(58)] Mojo error in GPU process: Validation failed for gpu.mojom.GpuChannel [VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)]
[437644:1210/114605.190090:ERROR:interface_endpoint_client.cc(656)] Message 2068679221 rejected by interface gpu.mojom.GpuChannel
[437644:1210/114605.190361:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[437644:1210/114605.190567:ERROR:raster_decoder.cc(2132)] [.RenderWorker-0x36805d64000]GL ERROR :GL_INVALID_VALUE : glCopySubTexture: unknown mailbox
[437644:1210/114605.197356:ERROR:shared_image_factory.cc(338)] UpdateSharedImage: Could not find shared image mailbox
[437644:1210/114605.197556:ERROR:shared_image_stub.cc(188)] SharedImageStub: Unable to update shared image
[437644:1210/114605.224047:ERROR:shared_image_manager.cc(214)] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.
[437644:1210/114605.225096:ERROR:shared_image_manager.cc(214)] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.

Please note that there are no other crashes or instabilities detected with the GPU on this machine whatsoever. There are no NVIDIA-related errors in dmesg, journalctl, or /var/log/Xorg.0.log, and all other graphics-accelerated applications, including running mattermost in Chrome, work perfectly.

However dmesg does show the mattermost crashes:

[1808946.403561] traps: mattermost-desk[3343234] trap int3 ip:55bb4a1202e4 sp:7fffe1d29570 error:0 in mattermost-desktop[55bb47d7e000+65ed000]
[2330081.952971] traps: mattermost-desk[4186538] trap int3 ip:55e3c207f2e4 sp:7ffd698a3140 error:0 in mattermost-desktop[55e3bfcdd000+65ed000]
[2416312.275566] traps: mattermost-desk[295182] trap int3 ip:560b773992e4 sp:7ffe0604cd70 error:0 in mattermost-desktop[560b74ff7000+65ed000]

Possible fixes

My best guess is that there is a race condition somewhere in the code. A texture for a slow-loading menu is loaded to GPU, but, because of my rapid navigating, the destination gets destroyed before it loads.

dllu avatar Dec 10 '21 23:12 dllu

@dllu Does this happen if you use Google Chrome or the Chromium browser as well? If you turn off GPU acceleration does it happen?

devinbinnie avatar Dec 13 '21 14:12 devinbinnie

Yes, I also use Google Chrome and Chromium. Both work perfectly fine with GPU acceleration.

I got some more crashes in Mattermost today.

16:10:02.403 › show back button
16:10:05.733 › show back button
16:10:07.529 › show back button
16:10:13.366 › show back button
16:10:13.863 › hide back button
[47494:0111/120732.264827:ERROR:validation_errors.cc(106)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[47494:0111/120732.265034:ERROR:gpu_child_thread.cc(58)] Mojo error in GPU process: Validation failed for gpu.mojom.GpuChannel [VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)]
[47494:0111/120732.265114:ERROR:interface_endpoint_client.cc(656)] Message 2068679221 rejected by interface gpu.mojom.GpuChannel
[47494:0111/120732.265229:ERROR:validation_errors.cc(106)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[47494:0111/120732.265287:ERROR:gpu_child_thread.cc(58)] Mojo error in GPU process: Validation failed for gpu.mojom.GpuChannel [VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)]
[47494:0111/120732.265340:ERROR:interface_endpoint_client.cc(656)] Message 2068679221 rejected by interface gpu.mojom.GpuChannel
[47494:0111/120732.265473:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265601:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265675:ERROR:raster_decoder.cc(2132)] [.RenderWorker-0x32e6002fb100]GL ERROR :GL_INVALID_VALUE : glCopySubTexture: unknown mailbox
[47494:0111/120732.265752:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265819:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265884:ERROR:raster_decoder.cc(2132)] [.RenderWorker-0x32e6002fb100]GL ERROR :GL_INVALID_VALUE : glCopySubTexture: unknown mailbox
12:07:32.723 › Renderer process for a webcontent is no longer available: crashed
12:07:52.482 › Error getting system idle time: Error: Render frame was disposed before WebFrameMain could be accessed
    at Object._send (<anonymous>)
    at Object.n.send (electron/js2c/browser_init.js:165:413)
    at Object.b.send (electron/js2c/browser_init.js:161:2492)
    at /opt/Mattermost/resources/app.asar/index.js:61077:56
    at Map.forEach (<anonymous>)
    at ViewManager.sendToAllViews (/opt/Mattermost/resources/app.asar/index.js:61077:18)
    at Object.sendToMattermostViews (/opt/Mattermost/resources/app.asar/index.js:59985:24)
    at UserActivityMonitor.<anonymous> (/opt/Mattermost/resources/app.asar/index.js:850:19)
    at UserActivityMonitor.emit (events.js:376:20)
    at UserActivityMonitor.sendStatusUpdate (/opt/Mattermost/resources/app.asar/index.js:66355:10)
12:07:53.484 › Error getting system idle time: Error: Render frame was disposed before WebFrameMain could be accessed
    at Object._send (<anonymous>)
    at Object.n.send (electron/js2c/browser_init.js:165:413)
    at Object.b.send (electron/js2c/browser_init.js:161:2492)
    at /opt/Mattermost/resources/app.asar/index.js:61077:56
    at Map.forEach (<anonymous>)
    at ViewManager.sendToAllViews (/opt/Mattermost/resources/app.asar/index.js:61077:18)
    at Object.sendToMattermostViews (/opt/Mattermost/resources/app.asar/index.js:59985:24)
    at UserActivityMonitor.<anonymous> (/opt/Mattermost/resources/app.asar/index.js:850:19)
    at UserActivityMonitor.emit (events.js:376:20)
    at UserActivityMonitor.sendStatusUpdate (/opt/Mattermost/resources/app.asar/index.js:66355:10)

I'll try without GPU acceleration and see if it keeps happening.

dllu avatar Jan 11 '22 20:01 dllu

Yes, I also use Google Chrome and Chromium. Both work perfectly fine with GPU acceleration.

I got some more crashes in Mattermost today.

16:10:02.403 › show back button
16:10:05.733 › show back button
16:10:07.529 › show back button
16:10:13.366 › show back button
16:10:13.863 › hide back button
[47494:0111/120732.264827:ERROR:validation_errors.cc(106)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[47494:0111/120732.265034:ERROR:gpu_child_thread.cc(58)] Mojo error in GPU process: Validation failed for gpu.mojom.GpuChannel [VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)]
[47494:0111/120732.265114:ERROR:interface_endpoint_client.cc(656)] Message 2068679221 rejected by interface gpu.mojom.GpuChannel
[47494:0111/120732.265229:ERROR:validation_errors.cc(106)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[47494:0111/120732.265287:ERROR:gpu_child_thread.cc(58)] Mojo error in GPU process: Validation failed for gpu.mojom.GpuChannel [VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)]
[47494:0111/120732.265340:ERROR:interface_endpoint_client.cc(656)] Message 2068679221 rejected by interface gpu.mojom.GpuChannel
[47494:0111/120732.265473:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265601:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265675:ERROR:raster_decoder.cc(2132)] [.RenderWorker-0x32e6002fb100]GL ERROR :GL_INVALID_VALUE : glCopySubTexture: unknown mailbox
[47494:0111/120732.265752:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265819:ERROR:shared_image_manager.cc(190)] SharedImageManager::ProduceGLTexturePassthrough: Trying to produce a representation from a non-existent mailbox.
[47494:0111/120732.265884:ERROR:raster_decoder.cc(2132)] [.RenderWorker-0x32e6002fb100]GL ERROR :GL_INVALID_VALUE : glCopySubTexture: unknown mailbox
12:07:32.723 › Renderer process for a webcontent is no longer available: crashed
12:07:52.482 › Error getting system idle time: Error: Render frame was disposed before WebFrameMain could be accessed
    at Object._send (<anonymous>)
    at Object.n.send (electron/js2c/browser_init.js:165:413)
    at Object.b.send (electron/js2c/browser_init.js:161:2492)
    at /opt/Mattermost/resources/app.asar/index.js:61077:56
    at Map.forEach (<anonymous>)
    at ViewManager.sendToAllViews (/opt/Mattermost/resources/app.asar/index.js:61077:18)
    at Object.sendToMattermostViews (/opt/Mattermost/resources/app.asar/index.js:59985:24)
    at UserActivityMonitor.<anonymous> (/opt/Mattermost/resources/app.asar/index.js:850:19)
    at UserActivityMonitor.emit (events.js:376:20)
    at UserActivityMonitor.sendStatusUpdate (/opt/Mattermost/resources/app.asar/index.js:66355:10)
12:07:53.484 › Error getting system idle time: Error: Render frame was disposed before WebFrameMain could be accessed
    at Object._send (<anonymous>)
    at Object.n.send (electron/js2c/browser_init.js:165:413)
    at Object.b.send (electron/js2c/browser_init.js:161:2492)
    at /opt/Mattermost/resources/app.asar/index.js:61077:56
    at Map.forEach (<anonymous>)
    at ViewManager.sendToAllViews (/opt/Mattermost/resources/app.asar/index.js:61077:18)
    at Object.sendToMattermostViews (/opt/Mattermost/resources/app.asar/index.js:59985:24)
    at UserActivityMonitor.<anonymous> (/opt/Mattermost/resources/app.asar/index.js:850:19)
    at UserActivityMonitor.emit (events.js:376:20)
    at UserActivityMonitor.sendStatusUpdate (/opt/Mattermost/resources/app.asar/index.js:66355:10)

I'll try without GPU acceleration and see if it keeps happening.

The issue you're showing here has since been fixed here: https://github.com/mattermost/desktop/issues/1888

If you're getting crashes in the GPU process (as shown in the original issue) then that might still be an issue, but if you'd like to avoid this error:

12:07:52.482 › Error getting system idle time: Error: Render frame was disposed before WebFrameMain could be accessed
    at Object._send (<anonymous>)
    at Object.n.send (electron/js2c/browser_init.js:165:413)
    at Object.b.send (electron/js2c/browser_init.js:161:2492)
    at /opt/Mattermost/resources/app.asar/index.js:61077:56
    at Map.forEach (<anonymous>)
    at ViewManager.sendToAllViews (/opt/Mattermost/resources/app.asar/index.js:61077:18)
    at Object.sendToMattermostViews (/opt/Mattermost/resources/app.asar/index.js:59985:24)
    at UserActivityMonitor.<anonymous> (/opt/Mattermost/resources/app.asar/index.js:850:19)
    at UserActivityMonitor.emit (events.js:376:20)
    at UserActivityMonitor.sendStatusUpdate (/opt/Mattermost/resources/app.asar/index.js:66355:10)

Then you can download v5.0.3-rc1, which should have a fix: https://github.com/mattermost/desktop/releases/tag/v5.0.3-rc1

devinbinnie avatar Jan 14 '22 15:01 devinbinnie

@dllu have you tried upgrading nvidia-driver to 515? I have similar GPU process crash issues on 510 with Quadro P4000 (Thinkpad P71), but not in Mattermost.

articice avatar Oct 05 '22 13:10 articice

Closing as inactive, likely resolved by v5.0.3

devinbinnie avatar May 04 '23 17:05 devinbinnie