9.4.1 simplex errors with sup and hac models on M2
Using the 9.4.1 fast v3.4 model on a M2 mac runs without errors at 1.12e+07 samples/sec Using 9.4.1 hac v3.3 produces multiple: Metal command buffer list failed: 5, at 3.479e+05 samples/sec Using 9.4.1 sup v9.4.1 v3.6 produces pages of errors: with an ultimate rate of: 8.575e+04 samples/sec sup errors: a few of these: [2023-07-26 12:57:48.280] [warning] Metal command buffer lstm failed: 5, try #0 [2023-07-26 12:57:48.280] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-07-26 12:5 hundreds of these: [2023-07-28 05:04:26.476] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-07-28 05:04:42.360] [warning] Metal command buffer linear/scan/softmax failed: 5, try #2 [2023-07-28 05:04:42.360] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-07-28 05:04:58.183] [warning] Metal command buffer linear/scan/softmax failed: 5, try #3 [2023-07-28 05:04:58.183] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-07-28 05:05:14.227] [warning] Metal command buffer linear/scan/softmax failed: 5, try #4 [2023-07-28 05:05:14.227] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) also, the eventual count of sequences is off: [2023-07-28 05:05:14.390] [info] > Reads basecalled: 29600000s] [2023-07-28 05:05:14.391] [info] > Basecalled @ Samples/s: 8.575702e+04 [2023-07-28 05:05:19.144] [info] > Finished
fast and hac models are returning 296000 for Reads base called.
I have the same problem, M2 Ultra on MAcStudio.
[2023-12-06 16:41:26.246] [info] > Creating basecall pipeline [2023-12-06 16:42:59.390] [info] - set batch size to 3504 [2023-12-06 16:44:14.950] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-06 16:44:14.950] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-06 16:44:53.973] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 1) [2023-12-06 16:44:53.973] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-06 16:45:31.890] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 2) [2023-12-06 16:45:31.890] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-06 16:46:10.444] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 3) [2023-12-06 16:46:10.444] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-06 16:46:48.556] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 4) [2023-12-06 16:46:48.556] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-06 16:46:48.577] [critical] Failed to successfully submit GPU command buffers.
I have the same problem with M2 Pro and dorado 0.5.0
dorado basecaller ~/dorado_model/[email protected] ./pod5/ -x metal > basecall.bam [2023-12-07 13:38:41.346] [info] > Creating basecall pipeline [2023-12-07 13:38:57.083] [info] - set batch size to 432 [2023-12-07 13:39:29.928] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:39:29.929] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:39:46.020] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 1) [2023-12-07 13:39:46.020] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:40:03.320] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 2) [2023-12-07 13:40:03.320] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:40:35.806] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:40:35.806] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:41:09.567] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:41:09.567] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:43:04.695] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:43:04.695] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:44:11.826] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:44:11.826] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:44:28.159] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 1) [2023-12-07 13:44:28.159] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:45:01.003] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:45:01.003] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:48:01.715] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:48:01.715] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:48:50.473] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:48:50.473] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:50:11.917] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:50:11.917] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:53:12.924] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:53:12.924] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:56:32.505] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0) [2023-12-07 13:56:32.505] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:56:48.497] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 1) [2023-12-07 13:56:48.497] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:57:04.407] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 2) [2023-12-07 13:57:04.407] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:57:20.447] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 3) [2023-12-07 13:57:20.447] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:57:36.416] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 4) [2023-12-07 13:57:36.416] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error)) [2023-12-07 13:57:36.446] [critical] Failed to successfully submit GPU command buffers. libc++abi: terminating due to uncaught exception of type std::runtime_error: Failed to successfully submit GPU command buffers. Abort trap: 6
dorado could basecall reads even if warnings, but then it stopped
Hi @dbernick @j-jamshidi @chilampoon,
We've been improving the stability of basecalling on Mac in recent releases.
Are you still experiencing issues?
Kind regards, Rich
Hi @dbernick @j-jamshidi @chilampoon,
We've been improving the stability of basecalling on Mac in recent releases.
Are you still experiencing issues?
Kind regards, Rich
Hey @HalfPhoton, I got the same issue with a fresh install of dorado v0.5.3 on M2 Pro macbook.
Dorado command:
dorado basecaller sup --kit-name SQK-16S024 --min-qscore 7 21_04_20_zfish.pod5 > 21_04_20_zfish.bam
Error code:
[2024-02-09 15:29:43.071] [info] Assuming cert location is /etc/ssl/cert.pem
[2024-02-09 15:29:43.072] [info] - downloading [email protected] with httplib
[2024-02-09 15:29:47.393] [info] > Creating basecall pipeline
[2024-02-09 15:30:00.845] [info] - set batch size to 432
[2024-02-09 15:30:00.845] [info] Barcode for SQK-16S024
[2024-02-09 15:31:06.899] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0)
[2024-02-09 15:31:06.899] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:31:55.668] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0)
[2024-02-09 15:31:55.668] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:32:27.847] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 0)
[2024-02-09 15:32:27.847] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:32:43.931] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 1)
[2024-02-09 15:32:43.931] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:32:59.966] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 2)
[2024-02-09 15:32:59.966] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:33:16.260] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 3)
[2024-02-09 15:33:16.260] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:33:32.559] [warning] Metal command buffer linear/scan/softmax failed: status 5 (try 4)
[2024-02-09 15:33:32.559] [warning] Command buffer error code: 1 (Internal Error (0000000e:Internal Error))
[2024-02-09 15:33:32.584] [critical] Failed to successfully submit GPU command buffers.
libc++abi: terminating due to uncaught exception of type std::runtime_error: Failed to successfully submit GPU command buffers.
Abort trap: 6
Hi. Can you tell me how much RAM the system has? The selected batch size does not seem unreasonable, but it would be good to rule out memory swapping as a factor in this case. Does this failure happen reliably?
Only 16 Gb of RAM, I can set the batch / chunk size if you think that may be the issue
the failure is very repeatable and does not happen with the version 10 model. In my case, the system has 32GBDavidOn Feb 12, 2024, at 1:25 AM, StuartAbercrombie @.***> wrote: Hi. Can you tell me how much RAM the system has? The selected batch size does not seem unreasonable, but it would be good to rule out memory swapping as a factor in this case. Does this failure happen reliably?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Hi @microbemarsh,
Does reducing the --batchsize improve things?