mil icon indicating copy to clipboard operation
mil copied to clipboard

General Protection Fault in GitHub Actions container in `balin`

Open cbrxyz opened this issue 5 months ago • 0 comments

What needs to change?

When running catkin_make -j10 (with 10 being the number of processes limited to the Docker container), the Python executable is getting killed roughly 20 seconds into the build process, around 15%-20% completion. This issue occurs within a Docker container that has been moved between computers.

some details:

  • The dmesg output shows the following trap:

    [12151.484884] traps: python3[2025829] general protection fault ip:56b5d5 sp:7ffd85f2c0a0 error:0 in python3.8[423000+294000]
    
  • I suspect that the python3 process in the error log corresponds to catkin_make, as catkin_make runs using Python. Other Python processes are running as well, but they appear to exit cleanly.

  • The catkin_make output indicates that it is being terminated unexpectedly.

Possible Causes:

  • This error is happening inside a Docker container, which should be reusable and composable. The error could be hardware-related, as suggested by a similar Proxmox forum post. The container has been moved between computers multiple times, and this is the first occurrence of such errors.

  • Additionally, there is an issue with pip on this computer: it often downloads files with bad CRC checks. While it usually succeeds after a couple of tries, the first attempt often fails. This could be another indicator of a deeper issue.

How would this task be tested?

  1. Ensure that CI is able to run okay!

cbrxyz avatar Sep 07 '24 21:09 cbrxyz