CUDA_ERROR_UNKNOWN Transcoding errors for EU orchestrators
Describe the bug Me and a few other O's are getting lots of
Apr 13 10:02:04 koios livepeer[184664]: I0413 10:02:04.836866 184664 ot_rpc.go:140] Transcoding taskId=43790 url=https://93.119.2.215:8935/stream/af77db10/494.tempfile
Apr 13 10:02:05 koios livepeer[184664]: [h264_cuvid @ 0x7ff66cd9e780] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
Apr 13 10:02:05 koios livepeer[184664]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
Apr 13 10:02:05 koios livepeer[184664]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
Apr 13 10:02:05 koios livepeer[184664]: E0413 10:02:05.024776 184664 ffmpeg.go:609] Transcoder Return : Generic error in an external library
Apr 13 10:02:05 koios livepeer[184664]: E0413 10:02:05.024865 184664 ot_rpc.go:193] manifestID=fb0acd4d-b9b6-4d44-ba21-0e67ac467440 seqNo=494 orchSessionID=af77db10 taskId=43790 Transcoding done for taskId=43790 url=https://93.119.2.215:8935/stream/af77db10/494.tempfile dur=136.829413ms err="Generic error in an external library"
Apr 13 10:02:05 koios livepeer[184664]: E0413 10:02:05.024882 184664 ot_rpc.go:248] manifestID=fb0acd4d-b9b6-4d44-ba21-0e67ac467440 seqNo=494 orchSessionID=af77db10 taskId=43790 Unable to transcode err="Generic error in an external library"
It doesn't seem to drop the stream itself, and only seems to happen for EU orchestrators (https://discord.com/channels/423160867534929930/932724294230900776/962928882665783306)
To Reproduce Have an active Orchestrator in the EU region
Expected behavior To transcode the segments without error
Desktop:
- OS: Arch Linux (Also on Ubuntu 20)
- Livepeer 0.5.29
- Driver Version: 510.60.02
- CUDA Version: 11.6
I am in US and got this error recently. It was received in the middle of a test stream and the test stream did seem to complete.
I0508 07:54:33.494151 197980 ot_rpc.go:140] Transcoding taskId=476828 url=https://162.244.81.94:8935/stream/0237f32e/138.tempfile
I0508 07:54:33.909694 197980 ot_rpc.go:140] Transcoding taskId=476829 url=https://162.244.81.94:8935/stream/e088e496/1.tempfile
[h264 @ 0x7f8930287680] Increasing reorder buffer to 1
[h264 @ 0x7f8930287680] Increasing reorder buffer to 1
[h264_cuvid @ 0x7f8930090040] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
E0508 07:54:34.082202 197980 ffmpeg.go:609] Transcoder Return : Generic error in an external library
E0508 07:54:34.082272 197980 ot_rpc.go:193] manifestID=d148cdbd-ccca-4c0e-996a-0b480dc78738 seqNo=1 orchSessionID=e088e496 taskId=476829 Transcoding done for taskId=476829 url=https://162.244.81.94:8935/stream/e088e496/1.tempfile dur=130.94481ms err="Generic error in an external library"
E0508 07:54:34.082281 197980 ot_rpc.go:248] manifestID=d148cdbd-ccca-4c0e-996a-0b480dc78738 seqNo=1 orchSessionID=e088e496 taskId=476829 Unable to transcode err="Generic error in an external library"
I0508 07:54:35.322677 197980 ot_rpc.go:140] Transcoding taskId=476830 url=https://162.244.81.94:8935/stream/0237f32e/139.tempfile
I0508 07:54:36.890699 197980 ot_rpc.go:140] Transcoding taskId=476831 url=https://162.244.81.94:8935/stream/0237f32e/140.tempfile
Getting a bunch of them in the US now too:
May 08 14:49:25 lasvegas livepeer[200050]: [h264_cuvid @ 0x7f99e4250300] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 14:49:25 lasvegas livepeer[200050]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 14:49:25 lasvegas livepeer[200050]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 14:49:25 lasvegas livepeer[200050]: E0508 14:49:25.962918 200050 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 14:49:25 lasvegas livepeer[200050]: E0508 14:49:25.963002 200050 orchestrator.go:555] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=654 orchSessionID=70d73911 clientIP=89.187.185.153 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 14:49:25 lasvegas livepeer[200050]: E0508 14:49:25.963147 200050 segment_rpc.go:234] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=654 orchSessionID=70d73911 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.185.153 Could not transcode err="Generic error in an external library"
May 08 14:49:59 lasvegas livepeer[200050]: [h264_cuvid @ 0x7f9974093a80] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 14:49:59 lasvegas livepeer[200050]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 14:49:59 lasvegas livepeer[200050]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 14:49:59 lasvegas livepeer[200050]: E0508 14:49:59.331521 200050 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 14:49:59 lasvegas livepeer[200050]: E0508 14:49:59.331605 200050 orchestrator.go:555] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=691 orchSessionID=d8b00843 clientIP=89.187.185.153 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 14:49:59 lasvegas livepeer[200050]: E0508 14:49:59.331726 200050 segment_rpc.go:234] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=691 orchSessionID=d8b00843 clientIP=89.187.185.153 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 14:36:47 chicago livepeer[317706]: [h264 @ 0x7f6f88082440] Increasing reorder buffer to 1
May 08 14:36:47 chicago livepeer[317706]: [h264 @ 0x7f6f88082440] Increasing reorder buffer to 1
May 08 14:36:48 chicago livepeer[317706]: [h264_cuvid @ 0x7f6f880785c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 14:36:48 chicago livepeer[317706]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 14:36:48 chicago livepeer[317706]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.100087 317706 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.100328 317706 orchestrator.go:555] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5760 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.100680 317706 segment_rpc.go:234] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5760 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.154443 317706 orchestrator.go:555] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5759 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="TranscoderStopped"
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.155014 317706 segment_rpc.go:234] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5759 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="TranscoderStopped"
May 08 12:51:59 chicago livepeer[317706]: I0508 12:51:59.913622 317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/268.ts
May 08 12:52:01 chicago livepeer[317706]: I0508 12:52:01.804733 317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/269.ts
May 08 12:52:03 chicago livepeer[317706]: I0508 12:52:03.801431 317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/270.ts
May 08 12:52:05 chicago livepeer[317706]: I0508 12:52:05.805480 317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/271.ts
May 08 12:52:07 chicago livepeer[317706]: I0508 12:52:07.790955 317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/272.ts
May 08 12:52:41 chicago livepeer[317706]: [h264_cuvid @ 0x7f6fd53a0900] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 12:52:41 chicago livepeer[317706]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 12:52:41 chicago livepeer[317706]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 12:52:41 chicago livepeer[317706]: E0508 12:52:41.453791 317706 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 12:52:41 chicago livepeer[317706]: E0508 12:52:41.453867 317706 orchestrator.go:555] manifestID=237d0657-358c-4924-af7c-db5268cf9869 seqNo=1 orchSessionID=96556443 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 12:52:41 chicago livepeer[317706]: E0508 12:52:41.453984 317706 segment_rpc.go:234] manifestID=237d0657-358c-4924-af7c-db5268cf9869 seqNo=1 orchSessionID=96556443 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 12:56:31 chicago livepeer[317706]: [h264_cuvid @ 0x7f6fd0ccc380] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 12:56:31 chicago livepeer[317706]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 12:56:31 chicago livepeer[317706]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.786203 317706 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.786956 317706 orchestrator.go:555] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.787266 317706 segment_rpc.go:234] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.840600 317706 orchestrator.go:555] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="TranscoderStopped"
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.840831 317706 segment_rpc.go:234] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="TranscoderStopped"
Issue is persisting on 0.5.30
May 12 09:13:48 chicago livepeer[435750]: [h264_cuvid @ 0x7f5cd0095180] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 12 09:13:48 chicago livepeer[435750]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 12 09:13:48 chicago livepeer[435750]: ERROR: transcoder.c:249] Could not decode; stopping : Generic error in an external library
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.446572 435750 ffmpeg.go:760] Transcoder Return : Generic error in an external library
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.446713 435750 orchestrator.go:558] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=1 orchSessionID=2a6ed2ab clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.446842 435750 segment_rpc.go:234] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=1 orchSessionID=2a6ed2ab sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=143.244.61.205 Could not transcode err="Generic error in an external library"
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.511532 435750 orchestrator.go:558] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=2 orchSessionID=2a6ed2ab clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="TranscoderStopped"
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.511935 435750 segment_rpc.go:234] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=2 orchSessionID=2a6ed2ab clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="TranscoderStopped"
May 12 09:14:52 chicago livepeer[435750]: [h264_cuvid @ 0x7f5c90124dc0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 12 09:14:52 chicago livepeer[435750]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 12 09:14:52 chicago livepeer[435750]: ERROR: transcoder.c:249] Could not decode; stopping : Generic error in an external library
May 12 09:14:52 chicago livepeer[435750]: E0512 09:14:52.526847 435750 ffmpeg.go:760] Transcoder Return : Generic error in an external library
May 12 09:14:52 chicago livepeer[435750]: E0512 09:14:52.527636 435750 orchestrator.go:558] manifestID=83cc7c6d-590a-4faa-bdea-424558914906 seqNo=1 orchSessionID=93e7ac74 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 12 09:14:52 chicago livepeer[435750]: E0512 09:14:52.527848 435750 segment_rpc.go:234] manifestID=83cc7c6d-590a-4faa-bdea-424558914906 seqNo=1 orchSessionID=93e7ac74 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
Just got another one:
I0713 14:00:08.706733 2219768 ot_rpc.go:142] Transcoding taskId=106151 url=https://162.244.81.94:8935/stream/67fbc9ff/1.tempfile
[h264_cuvid @ 0x7efce62591c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
ERROR: transcoder.c:238] Could not decode; stopping : Generic error in an external library
E0713 14:00:09.012549 2219768 ffmpeg.go:812] Transcoder Return : Generic error in an external library
E0713 14:00:09.012621 2219768 ot_rpc.go:195] manifestID=5ed4724c-d736-4298-8a32-356dd7ff77df seqNo=1 orchSessionID=67fbc9ff taskId=106151 Transcoding done for taskId=106151 url=https://162.244.81.94:8935/stream/67fbc9ff/1.tempfile dur=89.986892ms err="Generic error in an external library"
E0713 14:00:09.012636 2219768 ot_rpc.go:250] manifestID=5ed4724c-d736-4298-8a32-356dd7ff77df seqNo=1 orchSessionID=67fbc9ff taskId=106151 Unable to transcode err="Generic error in an external library"
E0713 14:00:09.042263 2219768 ot_rpc.go:289] manifestID=5ed4724c-d736-4298-8a32-356dd7ff77df seqNo=1 orchSessionID=67fbc9ff taskId=106151 Orchestrator returned HTTP statusCode=400 err="Invalid detection data\n"`
Looked over the problem and:
- This happens when we feed the video stream packet into decoder
cuvidParseVideoDatasuggests that this is a parsing problem, meaning that something is wrong with the bitstream or parsing code- Now, h.264 is around for almost 20 years, and while there were later additions (FRext or Fidelity Range Extensions on 07' for example), this code should be mature and I doubt there can be any problems with parsing the syntax
- What is possible is hitting some limits of HW decoder, for example something like a thread here: https://forums.developer.nvidia.com/t/nvcuvid-problem-cuvidparsevideodata-cant-accept-the-payload-that-large-than-2m-bytes/35603/9
- Need a bitstream to investigate more