openpilot icon indicating copy to clipboard operation
openpilot copied to clipboard

comma three: crash due to NVMe disconnection

Open brittonx opened this issue 3 years ago • 9 comments

Describe the bug

Was driving home. Wheel jerked left, error displayed on car dash, C3 wouldn't disengage, C3 rebooted a few seconds later. Happenedat the end of this drive segment: aeaff1b402063ccb|2022-04-22--19-11-50--0

What hardware does this issue affect?

comma three

Provide a route where the issue occurs

aeaff1b402063ccb|2022-04-22--19-11-50--0

openpilot version

0 8.14 master-ci

Additional info

No response

brittonx avatar Apr 23 '22 01:04 brittonx

The C3 running the latest master-ci locked up again with errors on the car dash immediately after it locked up. It rebooted on its own after a few seconds. Here is the last segment where it locked up. aeaff1b402063ccb|2022-04-23--14-33-44--5

brittonx avatar Apr 23 '22 21:04 brittonx

Additional observation: Both times this happened were on the first drive after a reboot for an update prompt at the end of the prior drive. In both cases, when turn on the car, I saw the update screen flash up for a fraction of a second.

brittonx avatar Apr 24 '22 13:04 brittonx

Looks like during both drives the NVME drive lost connection, causing a kernel panic. We're working on fixing this from the software side, but if you keep running into this your NVME drive might have a flaky connection. You can try reseating it and see if the problem goes away.

pd0wm avatar Apr 25 '22 09:04 pd0wm

Looks like during both drives the NVME drive lost connection, causing a kernel panic. We're working on fixing this from the software side, but if you keep running into this your NVME drive might have a flaky connection. You can try reseating it and see if the problem goes away.

Thanks for the feedback. Well, that would make sense. Most of the time I've had me C3 I've been running Shane's fork and would get a warning message about the NVMe drive nor connected. A manual reboot would clear the message. I posted in Discord a long while back about it, but since I was on a fork, I was told to run on an official version then I could report it. That is part of why I was running off master-ci. I will reseat the drive.

brittonx avatar Apr 25 '22 10:04 brittonx

More from the MTBF report:

88bc1a4af0ae6e4e|2022-05-09--15-54-13
69e087e9fcde3719|2022-06-04--08-38-54--boot

adeebshihadeh avatar May 12 '22 21:05 adeebshihadeh

I reseated the NVMe and still see the occasional not mounted error. Yesterday, I drove in to the area where I work. The actual drive time was from8:29am-9:02am In Connect it only shows 8:29-8:36 then another at 9:16am from when I drove from one location to another. This means it stopped recording at 8:36 and didn't record from 8:37-9:02. There were no NVMe errors shown, but I suspect it could be related. aeaff1b402063ccb|2022-05-12--08-29-55--0 Comma support has me sending my C3 in to be checked.

brittonx avatar May 13 '22 11:05 brittonx

I had another one today. This is the thirdbor fourth time it has occurred in the exact same location!!!! Here is the segment aeaff1b402063ccb|2022-05-13--17-38-14--0 Could there be some bit of bad gps data that is triggers this or is it after some period of time?

brittonx avatar May 13 '22 23:05 brittonx

A quick follow-up. Comma replaced my C3 and I have not experienced the issue since. They have my original C3 now.

brittonx avatar Jun 20 '22 19:06 brittonx

I have an interesting issue that started around 11/10 after an update on the Sunnypilot test-C3 I get an error banner NVMe drive not mounted on my comma3. I've tried downloading OP stock, OP master, And OP master-ci with no luck. I also tried reseating with no luck, oddly enough if I go back to the SP 0.8.12-prod-full-hkg Fork I stop getting the NVMe not mounted error and the logs show that its Mounted. As soon as I download anything newer it stops seeing the NVMe and I get the same error and the drive disappears from the log.

BMarks11 avatar Dec 01 '22 00:12 BMarks11