go-livepeer icon indicating copy to clipboard operation
go-livepeer copied to clipboard

CUDA_ERROR_UNKNOWN Transcoding errors for EU orchestrators

Open stronk-dev opened this issue 3 years ago • 5 comments

Describe the bug Me and a few other O's are getting lots of

Apr 13 10:02:04 koios livepeer[184664]: I0413 10:02:04.836866  184664 ot_rpc.go:140] Transcoding taskId=43790 url=https://93.119.2.215:8935/stream/af77db10/494.tempfile
Apr 13 10:02:05 koios livepeer[184664]: [h264_cuvid @ 0x7ff66cd9e780] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
Apr 13 10:02:05 koios livepeer[184664]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
Apr 13 10:02:05 koios livepeer[184664]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
Apr 13 10:02:05 koios livepeer[184664]: E0413 10:02:05.024776  184664 ffmpeg.go:609] Transcoder Return : Generic error in an external library
Apr 13 10:02:05 koios livepeer[184664]: E0413 10:02:05.024865  184664 ot_rpc.go:193] manifestID=fb0acd4d-b9b6-4d44-ba21-0e67ac467440 seqNo=494 orchSessionID=af77db10 taskId=43790 Transcoding done for taskId=43790 url=https://93.119.2.215:8935/stream/af77db10/494.tempfile dur=136.829413ms err="Generic error in an external library"
Apr 13 10:02:05 koios livepeer[184664]: E0413 10:02:05.024882  184664 ot_rpc.go:248] manifestID=fb0acd4d-b9b6-4d44-ba21-0e67ac467440 seqNo=494 orchSessionID=af77db10 taskId=43790 Unable to transcode err="Generic error in an external library"

It doesn't seem to drop the stream itself, and only seems to happen for EU orchestrators (https://discord.com/channels/423160867534929930/932724294230900776/962928882665783306)

To Reproduce Have an active Orchestrator in the EU region

Expected behavior To transcode the segments without error

Desktop:

  • OS: Arch Linux (Also on Ubuntu 20)
  • Livepeer 0.5.29
  • Driver Version: 510.60.02
  • CUDA Version: 11.6

stronk-dev avatar Apr 13 '22 09:04 stronk-dev

I am in US and got this error recently. It was received in the middle of a test stream and the test stream did seem to complete.

I0508 07:54:33.494151  197980 ot_rpc.go:140] Transcoding taskId=476828 url=https://162.244.81.94:8935/stream/0237f32e/138.tempfile
I0508 07:54:33.909694  197980 ot_rpc.go:140] Transcoding taskId=476829 url=https://162.244.81.94:8935/stream/e088e496/1.tempfile
[h264 @ 0x7f8930287680] Increasing reorder buffer to 1
[h264 @ 0x7f8930287680] Increasing reorder buffer to 1
[h264_cuvid @ 0x7f8930090040] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
E0508 07:54:34.082202  197980 ffmpeg.go:609] Transcoder Return : Generic error in an external library
E0508 07:54:34.082272  197980 ot_rpc.go:193] manifestID=d148cdbd-ccca-4c0e-996a-0b480dc78738 seqNo=1 orchSessionID=e088e496 taskId=476829 Transcoding done for taskId=476829 url=https://162.244.81.94:8935/stream/e088e496/1.tempfile dur=130.94481ms err="Generic error in an external library"
E0508 07:54:34.082281  197980 ot_rpc.go:248] manifestID=d148cdbd-ccca-4c0e-996a-0b480dc78738 seqNo=1 orchSessionID=e088e496 taskId=476829 Unable to transcode err="Generic error in an external library"
I0508 07:54:35.322677  197980 ot_rpc.go:140] Transcoding taskId=476830 url=https://162.244.81.94:8935/stream/0237f32e/139.tempfile
I0508 07:54:36.890699  197980 ot_rpc.go:140] Transcoding taskId=476831 url=https://162.244.81.94:8935/stream/0237f32e/140.tempfile

ad-astra-video avatar May 08 '22 13:05 ad-astra-video

Getting a bunch of them in the US now too:

May 08 14:49:25 lasvegas livepeer[200050]: [h264_cuvid @ 0x7f99e4250300] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 14:49:25 lasvegas livepeer[200050]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 14:49:25 lasvegas livepeer[200050]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 14:49:25 lasvegas livepeer[200050]: E0508 14:49:25.962918  200050 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 14:49:25 lasvegas livepeer[200050]: E0508 14:49:25.963002  200050 orchestrator.go:555] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=654 orchSessionID=70d73911 clientIP=89.187.185.153 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 14:49:25 lasvegas livepeer[200050]: E0508 14:49:25.963147  200050 segment_rpc.go:234] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=654 orchSessionID=70d73911 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.185.153 Could not transcode err="Generic error in an external library"
May 08 14:49:59 lasvegas livepeer[200050]: [h264_cuvid @ 0x7f9974093a80] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 14:49:59 lasvegas livepeer[200050]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 14:49:59 lasvegas livepeer[200050]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 14:49:59 lasvegas livepeer[200050]: E0508 14:49:59.331521  200050 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 14:49:59 lasvegas livepeer[200050]: E0508 14:49:59.331605  200050 orchestrator.go:555] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=691 orchSessionID=d8b00843 clientIP=89.187.185.153 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 14:49:59 lasvegas livepeer[200050]: E0508 14:49:59.331726  200050 segment_rpc.go:234] manifestID=73dae36d-0c0a-477e-aa84-e046aff795f1 seqNo=691 orchSessionID=d8b00843 clientIP=89.187.185.153 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 14:36:47 chicago livepeer[317706]: [h264 @ 0x7f6f88082440] Increasing reorder buffer to 1
May 08 14:36:47 chicago livepeer[317706]: [h264 @ 0x7f6f88082440] Increasing reorder buffer to 1
May 08 14:36:48 chicago livepeer[317706]: [h264_cuvid @ 0x7f6f880785c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 14:36:48 chicago livepeer[317706]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 14:36:48 chicago livepeer[317706]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.100087  317706 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.100328  317706 orchestrator.go:555] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5760 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.100680  317706 segment_rpc.go:234] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5760 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.154443  317706 orchestrator.go:555] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5759 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="TranscoderStopped"
May 08 14:36:48 chicago livepeer[317706]: E0508 14:36:48.155014  317706 segment_rpc.go:234] manifestID=88f03af7-c563-46c1-bba0-381c133db2c5 seqNo=5759 orchSessionID=e7dfe2d3 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="TranscoderStopped"
May 08 12:51:59 chicago livepeer[317706]: I0508 12:51:59.913622  317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/268.ts
May 08 12:52:01 chicago livepeer[317706]: I0508 12:52:01.804733  317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/269.ts
May 08 12:52:03 chicago livepeer[317706]: I0508 12:52:03.801431  317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/270.ts
May 08 12:52:05 chicago livepeer[317706]: I0508 12:52:05.805480  317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/271.ts
May 08 12:52:07 chicago livepeer[317706]: I0508 12:52:07.790955  317706 player.go:105] LPMS got HTTP request @ /stream/d64b2bac/480p/272.ts
May 08 12:52:41 chicago livepeer[317706]: [h264_cuvid @ 0x7f6fd53a0900] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 12:52:41 chicago livepeer[317706]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 12:52:41 chicago livepeer[317706]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 12:52:41 chicago livepeer[317706]: E0508 12:52:41.453791  317706 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 12:52:41 chicago livepeer[317706]: E0508 12:52:41.453867  317706 orchestrator.go:555] manifestID=237d0657-358c-4924-af7c-db5268cf9869 seqNo=1 orchSessionID=96556443 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 12:52:41 chicago livepeer[317706]: E0508 12:52:41.453984  317706 segment_rpc.go:234] manifestID=237d0657-358c-4924-af7c-db5268cf9869 seqNo=1 orchSessionID=96556443 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 12:56:31 chicago livepeer[317706]: [h264_cuvid @ 0x7f6fd0ccc380] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 08 12:56:31 chicago livepeer[317706]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 08 12:56:31 chicago livepeer[317706]: ERROR: transcoder.c:236] Could not decode; stopping : Generic error in an external library
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.786203  317706 ffmpeg.go:609] Transcoder Return : Generic error in an external library
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.786956  317706 orchestrator.go:555] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.787266  317706 segment_rpc.go:234] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.840600  317706 orchestrator.go:555] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="TranscoderStopped"
May 08 12:56:31 chicago livepeer[317706]: E0508 12:56:31.840831  317706 segment_rpc.go:234] manifestID=0f5c5cb3-0ea5-4615-a891-8b7d58aa1fa5 seqNo=2 orchSessionID=fab75957 clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="TranscoderStopped"

stronk-dev avatar May 08 '22 14:05 stronk-dev

Issue is persisting on 0.5.30

May 12 09:13:48 chicago livepeer[435750]: [h264_cuvid @ 0x7f5cd0095180] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 12 09:13:48 chicago livepeer[435750]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 12 09:13:48 chicago livepeer[435750]: ERROR: transcoder.c:249] Could not decode; stopping : Generic error in an external library
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.446572  435750 ffmpeg.go:760] Transcoder Return : Generic error in an external library
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.446713  435750 orchestrator.go:558] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=1 orchSessionID=2a6ed2ab clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.446842  435750 segment_rpc.go:234] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=1 orchSessionID=2a6ed2ab sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=143.244.61.205 Could not transcode err="Generic error in an external library"
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.511532  435750 orchestrator.go:558] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=2 orchSessionID=2a6ed2ab clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="TranscoderStopped"
May 12 09:13:48 chicago livepeer[435750]: E0512 09:13:48.511935  435750 segment_rpc.go:234] manifestID=8619b3a7-f280-450e-83ee-8d41b5a1e946 seqNo=2 orchSessionID=2a6ed2ab clientIP=143.244.61.205 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="TranscoderStopped"
May 12 09:14:52 chicago livepeer[435750]: [h264_cuvid @ 0x7f5c90124dc0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
May 12 09:14:52 chicago livepeer[435750]: ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
May 12 09:14:52 chicago livepeer[435750]: ERROR: transcoder.c:249] Could not decode; stopping : Generic error in an external library
May 12 09:14:52 chicago livepeer[435750]: E0512 09:14:52.526847  435750 ffmpeg.go:760] Transcoder Return : Generic error in an external library
May 12 09:14:52 chicago livepeer[435750]: E0512 09:14:52.527636  435750 orchestrator.go:558] manifestID=83cc7c6d-590a-4faa-bdea-424558914906 seqNo=1 orchSessionID=93e7ac74 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
May 12 09:14:52 chicago livepeer[435750]: E0512 09:14:52.527848  435750 segment_rpc.go:234] manifestID=83cc7c6d-590a-4faa-bdea-424558914906 seqNo=1 orchSessionID=93e7ac74 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"

stronk-dev avatar May 12 '22 10:05 stronk-dev

Just got another one:

I0713 14:00:08.706733 2219768 ot_rpc.go:142] Transcoding taskId=106151 url=https://162.244.81.94:8935/stream/67fbc9ff/1.tempfile
[h264_cuvid @ 0x7efce62591c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
ERROR: decoder.c:64] Error sending packet to decoder : Generic error in an external library
ERROR: transcoder.c:238] Could not decode; stopping : Generic error in an external library
E0713 14:00:09.012549 2219768 ffmpeg.go:812] Transcoder Return : Generic error in an external library
E0713 14:00:09.012621 2219768 ot_rpc.go:195] manifestID=5ed4724c-d736-4298-8a32-356dd7ff77df seqNo=1 orchSessionID=67fbc9ff taskId=106151 Transcoding done for taskId=106151 url=https://162.244.81.94:8935/stream/67fbc9ff/1.tempfile dur=89.986892ms err="Generic error in an external library"
E0713 14:00:09.012636 2219768 ot_rpc.go:250] manifestID=5ed4724c-d736-4298-8a32-356dd7ff77df seqNo=1 orchSessionID=67fbc9ff taskId=106151 Unable to transcode err="Generic error in an external library"
E0713 14:00:09.042263 2219768 ot_rpc.go:289] manifestID=5ed4724c-d736-4298-8a32-356dd7ff77df seqNo=1 orchSessionID=67fbc9ff taskId=106151 Orchestrator returned HTTP statusCode=400 err="Invalid detection data\n"`

ad-astra-video avatar Jul 14 '22 00:07 ad-astra-video

Looked over the problem and:

  • This happens when we feed the video stream packet into decoder
  • cuvidParseVideoData suggests that this is a parsing problem, meaning that something is wrong with the bitstream or parsing code
  • Now, h.264 is around for almost 20 years, and while there were later additions (FRext or Fidelity Range Extensions on 07' for example), this code should be mature and I doubt there can be any problems with parsing the syntax
  • What is possible is hitting some limits of HW decoder, for example something like a thread here: https://forums.developer.nvidia.com/t/nvcuvid-problem-cuvidparsevideodata-cant-accept-the-payload-that-large-than-2m-bytes/35603/9
  • Need a bitstream to investigate more

MikeIndiaAlpha avatar Jul 14 '22 10:07 MikeIndiaAlpha