go-livepeer
go-livepeer copied to clipboard
CUDA_ERROR_NOT_PERMITTED: operation not permitted - Error number -1448234581 occurred
Describe the bug This is the second server I've received this error on. At first I thought it was an issue on my side, but now I belive something else is going on:
sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=84.17.50.98 LB: Transcode submitted for key=de278b92_0
[AVHWDeviceContext @ 0x7fc1ec335d00] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
E0826 05:09:06.423332 1 ffmpeg.go:977] Transcoder Return : Unrecoverable state, restart process
I0826 05:09:06.423372 1 lb.go:192] manifestID=3463mrgx6zbfk6p3 seqNo=0 orchSessionID=de278b92 clientIP=84.17.50.98 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=de278b92_0
I0826 05:09:06.423381 1 lb.go:122] manifestID=3463mrgx6zbfk6p3 seqNo=0 orchSessionID=de278b92 clientIP=84.17.50.98 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=de278b92_0
panic: Unrecoverable state, restart process
goroutine 3715866 [running]:
github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc000452420, {0x87c600, 0xc000277830}, {{0x897030, 0xc000d08b80}, {0x897030, 0xc000d08b80}}, 0xc000d08b40, 0xc0008a18c0)
/src/core/orchestrator.go:557 +0xb3d
github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
/src/core/orchestrator.go:660 +0xfe
created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
/src/core/orchestrator.go:632 +0x44f
Livepeer is running in Docker and the error crashes the node which instantly restarts right after.
To Reproduce Steps to reproduce the behavior:
Reproducing the error is difficult as it doesn't seem to happen at any given time.
Desktop (please complete the following information):
- OS: Ubuntu
- Version 22.04.1 LTS
livepeer[7148]: E0211 10:20:44.236142 7148 segment_rpc.go:128] manifestID=b48a3a03-736a-456a-a65e-09db797ff44d seqNo=3931 orchSessionID=19b73b91 clientIP=185.59.221.179 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not read request body - err="stream error: stream ID 5; CANCEL"
livepeer[7148]: [h264_cuvid @ 0x7fe8ca813640] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
livepeer[7148]: ERROR: decoder.c:78] Error sending packet to decoder : Generic error in an external library
livepeer[7148]: ERROR: transcoder.c:720] Could not decode; stopping : Generic error in an external library
livepeer[7148]: E0211 10:21:20.122395 7148 ffmpeg.go:979] Transcoder Return : Generic error in an external library
livepeer[7148]: I0211 10:21:20.122451 7148 lb.go:215] manifestID=dedf4a81-99c4-4419-bc9b-2de96898e0e8 seqNo=1 orchSessionID=7c6998be clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=7c6998be_0
livepeer[7148]: E0211 10:21:20.122526 7148 orchestrator.go:565] manifestID=dedf4a81-99c4-4419-bc9b-2de96898e0e8 seqNo=1 orchSessionID=7c6998be clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Error transcoding segName= err="Generic error in an external library"
livepeer[7148]: E0211 10:21:20.122806 7148 segment_rpc.go:230] manifestID=dedf4a81-99c4-4419-bc9b-2de96898e0e8 seqNo=1 orchSessionID=7c6998be clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not transcode err="Generic error in an external library"
I have been plagued by this error, I tried changing the drivers, changing the card itself nothing helps.
OS type: Ubuntu 22.04.1 LTS
Has anybody already found the root cause, or do you know the best way to troubleshoot this behaviour? I receive this error on my transcoder every two days without apparent reason. The GPU usage is below 25%, and I see no problems in the Linux system logs π€.
System information
- OS: Ubuntu 22.04
- NVIDIA GPU: 1x NVidia 1070ti
- NVIDIA Driver: 545.23.06
Usage method
I use the following docker compose file:
version: "3.8"
name: livepeer-orchestrator
services:
livepeer-combined-orchestrator:
image: livepeer/go-livepeer:0.7.1
container_name: "livepeer-combined-orchestrator"
restart: unless-stopped
runtime: nvidia
ports:
# - 7936:7935 # Make CLI server available on host.
- 8935:8935
volumes:
- ./config/lporch.cfg:/root/lporch.cfg
- lpdata:/root/.lpData
command: ["-config", "/root/lporch.cfg", "-ethPassword", "/run/secrets/eth_password"]
secrets:
- eth_password
volumes:
lpdata:
networks:
default:
name: livepeer
external: true
secrets:
eth_password:
file: ./.eth_password.txt
Logs
livepeer-combined-orchestrator | E1107 04:38:26.966520 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=147862814 toBlock=147863813
livepeer-combined-orchestrator | 2023/11/07 05:18:31 http: TLS handshake error from 89.187.177.196:22753: EOF
livepeer-combined-orchestrator | E1107 06:23:06.961981 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=147934784 toBlock=147934803
livepeer-combined-orchestrator | 2023/11/07 07:21:48 http: TLS handshake error from 143.244.33.56:40566: EOF
livepeer-combined-orchestrator | E1107 07:26:41.970444 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=147949845 toBlock=147949863
livepeer-combined-orchestrator | [Parsed_scale_npp_0 @ 0x7f5640f61180] super-sampling not supported for output dimensions, using lanczos instead.
livepeer-combined-orchestrator | [Parsed_scale_npp_0 @ 0x7f5640f5f600] super-sampling not supported for output dimensions, using lanczos instead.
livepeer-combined-orchestrator | 2023/11/07 14:22:02 http: TLS handshake error from 143.244.33.56:12149: EOF
livepeer-combined-orchestrator | 2023/11/07 15:20:45 http: TLS handshake error from 138.199.4.163:48216: EOF
livepeer-combined-orchestrator | I1107 15:37:47.068476 1 transactionManager.go:119]
livepeer-combined-orchestrator | ******************************Eth Transaction******************************
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | Invoking transaction: "rewardWithHint". Inputs: "_newPosPrev: 0x0000000000000000000000000000000000000000 _newPosNext: 0x0000000000000000000000000000000000000000" Hash: "0x7dd86a1d810b276ec4cdb0ebd0c80e20ceff0b15c23ad62a5fadbceef4d09434".
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | ***************************************************************************
livepeer-combined-orchestrator | I1107 15:37:49.163822 1 rewardservice.go:105] Called reward for round 3168
livepeer-combined-orchestrator | E1107 15:46:01.958407 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=148068192 toBlock=148068215
livepeer-combined-orchestrator | E1107 15:46:02.153312 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=148068192 toBlock=148068259
livepeer-combined-orchestrator | E1107 15:46:06.952624 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=148068192 toBlock=148068274
livepeer-combined-orchestrator | E1107 15:46:11.950377 1 block_watcher.go:373] failed to fetch logs for range error=request failed or timed out fromBlock=148068192 toBlock=148068297
livepeer-combined-orchestrator | 2023/11/07 16:01:04 http: TLS handshake error from 103.106.228.158:34870: EOF
livepeer-combined-orchestrator | [AVHWDeviceContext @ 0x7f5618008c40] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
livepeer-combined-orchestrator | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
livepeer-combined-orchestrator | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
livepeer-combined-orchestrator | E1107 16:19:45.481648 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | goroutine 635711 [running]:
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc000602680, {0x89ffd0, 0xc000a3b8c0}, {{0x8a66d8?, 0xc00099edc0?}, {0x8a66d8?, 0xc00099edc0?}}, 0xc00099ed80, 0xc0006b86e0)
livepeer-combined-orchestrator | /src/core/orchestrator.go:588 +0xb50
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
livepeer-combined-orchestrator | /src/core/orchestrator.go:680 +0x9b
livepeer-combined-orchestrator | created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
livepeer-combined-orchestrator | /src/core/orchestrator.go:665 +0x47a
@rickstaa Do you by chance have the manifest ID or log lines before the error? I suspect the issue is related to a VOD profile setting. I received the same error a few ago when transcoding a VOD stream.
@rickstaa Do you by chance have the manifest ID or log lines before the error? I suspect the issue is related to a VOD profile setting. I received the same error a few ago when transcoding a VOD stream.
@papabear99, thanks for helping me debug this issue and trying to fix it. The manifest ID is kept from being stored anywhere. If I'm correct, do I need to use the -currentManifest
flag for those to become available? Or do I need to increase the log verbosity? I do have the complete log before the crash. I appended more lines to the log above.
@papabear99 I'm running the titan-node pool at the same time, and those gave the following errors when the system crashed:
titan-node-pool | Results from background performance check: 0.07647 / max threshold 0.35 - Passed.
titan-node-pool | Uptime Rewards: Earning approximately 6300.0 nLPT
titan-node-pool | data [[Source File /tmp/_MEIkiI9RJ/bbb/source.m3u8] [Transcoding Options /tmp/_MEIkiI9RJ/transcodingOptions.json] [Concurrent Sessions 1] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1107 15:34:05.970995 24088 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEIkiI9RJ/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEIkiI9RJ/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 1 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7f35e4848440] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1107 15:34:05.995279 24088 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 16 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000ce1e58, 0xc000ce1e80, {0xc0000ec900?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with Benchmarking - closing down
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | Your GPU has run out of memory and needs to be patched to continue - Would you like to try patching automatically? (y/n): Error: EOF when reading a line
titan-node-pool | An error has occurred, restarting in 10 minutes...
titan-node-pool | Using ETH address: 0x2390f4d31cB118fB23De052754549F1f325497C1
titan-node-pool | Nickname: pools.transcode.eth
titan-node-pool | Using the following Nvidia cards: all
titan-node-pool | Titan Node Pool Version: 1.34
titan-node-pool | Join the Titan Node Pool Discord for help and support: https://discord.gg/FbB89GDgkC
titan-node-pool | Locating all nodes, please wait...
titan-node-pool | FRA.titan-node-orch.com = 21.17 milliseconds
titan-node-pool | JOH.titan-node-orch.com = 176.78 milliseconds
titan-node-pool | LA.titan-node-orch.com = 153.68 milliseconds
titan-node-pool | LON.titan-node-orch.com = 17.45 milliseconds
titan-node-pool | MUM.titan-node-orch.com = 178.38 milliseconds
titan-node-pool | NY.titan-node-orch.com = 95.22 milliseconds
titan-node-pool | SAO.titan-node-orch.com = 193.39 milliseconds
titan-node-pool | SIN.titan-node-orch.com = 180.67 milliseconds
titan-node-pool | SYD.titan-node-orch.com = 261.58 milliseconds
titan-node-pool | TOK.titan-node-orch.com = 238.33 milliseconds
titan-node-pool | TOR.titan-node-orch.com = 93.86 milliseconds
titan-node-pool | Selected Nodes: FRA.titan-node-orch.com LON.titan-node-orch.com
titan-node-pool | Previous Internet Speed Check: Passed - Skipping
titan-node-pool | Max sessions set to 21
titan-node-pool | Checking driver patch status
titan-node-pool | data [[Source File /tmp/_MEIkiI9RJ/bbb/source.m3u8] [Transcoding Options /tmp/_MEIkiI9RJ/transcodingOptions.json] [Concurrent Sessions 21] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1107 15:44:17.140969 24106 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEIkiI9RJ/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEIkiI9RJ/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 21 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7fa1588b3d40] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1107 15:44:17.155818 24106 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 7 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000c4fe58, 0xc000c4fe80, {0xc0001f0900?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with patch test - failed to launch
titan-node-pool | I1107 17:19:12.174471 2099 ot_rpc.go:147] Transcoding taskId=18576 url=https://fra.titan-node-orch.com:8935/stream/2e85e25a/138.tempfile
titan-node-pool | [AVHWDeviceContext @ 0x7f1ae981a7c0] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1107 17:19:12.347758 2099 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | E1107 17:19:12.347841 2099 ot_rpc.go:200] manifestID=907e1a0e-7c8e-4806-be90-db7b4f3b42c5 seqNo=138 orchSessionID=2e85e25a taskId=18576 Transcoding done for taskId=18576 url=https://fra.titan-node-orch.com:8935/stream/2e85e25a/138.tempfile dur=48.253279ms err="Unrecoverable state, restart process"
titan-node-pool | E1107 17:19:12.347852 2099 ot_rpc.go:255] manifestID=907e1a0e-7c8e-4806-be90-db7b4f3b42c5 seqNo=138 orchSessionID=2e85e25a taskId=18576 Unable to transcode err="Unrecoverable state, restart process"
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 869 [running]:
titan-node-pool | github.com/livepeer/go-livepeer/server.runTranscode.func2()
titan-node-pool | /__w/go-livepeer/go-livepeer/server/ot_rpc.go:203 +0x2a
titan-node-pool | github.com/livepeer/go-livepeer/server.runTranscode(0xc000002000, {0xc000810008, 0x1c}, 0xc0003bbd70?, 0xc0006094a0)
titan-node-pool | /__w/go-livepeer/go-livepeer/server/ot_rpc.go:206 +0xff5
titan-node-pool | github.com/livepeer/go-livepeer/server.runTranscoder.func2()
titan-node-pool | /__w/go-livepeer/go-livepeer/server/ot_rpc.go:138 +0x36
titan-node-pool | created by github.com/livepeer/go-livepeer/server.runTranscoder
titan-node-pool | /__w/go-livepeer/go-livepeer/server/ot_rpc.go:137 +0x7aa
titan-node-pool | Error with patch test - failed to launch
titan-node-pool | Error with patch test - failed to launch
titan-node-pool | Error with patch test - failed to launch
titan-node-pool | Error with patch test - failed to launch
titan-node-pool | Error with patch test - failed to launch
The strange thing is that the GPU memory error doesn't make sense since the GPU memory never crossed the 30% mark.
I'm trying to determine which binary is causing the problems (i.e. titans or the livepeer-go one) π€.
@papabear99, I reached out to @Titan-Node to get some clarity. I'm trying to nail down if the problem lies within his binary or if the errors it's throwing are connected to the broader go-livepeer issue.
The go-livepeer
binary encountered an error at precisely 16:19:45.481648
, whereas the titan-node logs recorded errors minutes before this occurrence. It appears that the issues stem from the underlying go-livepeer
binary, which is utilized by the titan-node binary as well.
E1107 16:19:45.481648 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
add -v=6 to your config file for verbose logging to see the manifestID.
Your Nvidia driver appears to be patched because it set your maxSessions to 21 (which is based on the benchmark Titan runs on first start) However looking at the log from Titan's software it's throwing this error Your GPU has run out of memory and needs to be patched
Have you updated your driver since first starting in the pool? If you update your driver, you need to repatch after every update. With an unpatched driver a 1070ti is limited to 5 sessions.
add -v=6 to your config file for verbose logging to see the manifestID.
Hey @papabear99, thanks for your quick response. I increased the log level to 6
so that we can see what goes wrong the next time something happens ππ».
Your Nvidia driver appears to be patched because it set your maxSessions to 21 (which is based on the benchmark Titan runs on first start) However looking at the log from Titan's software it's throwing this error
Your GPU has run out of memory and needs to be patched
Have you updated your driver since first starting in the pool? If you update your driver, you need to repatch after every update. With an unpatched driver a 1070ti is limited to 5 sessions.
I patched the Nvidia driver some days ago and pinned the driver version after that, so I think Titan's software is misinterpreting the error (see https://discord.com/channels/922744261651361803/922744262213402656/1171853829696913408). I now suspect your hypothesis in https://github.com/livepeer/go-livepeer/issues/2570#issuecomment-1800415035 is most likely.
The Titan binary is designed to execute specific benchmarks at different intervals:
- At the start: Initiates a benchmark for one segment with your max sessions limit (i.e., 21), which is retried on encountering a fatal error.
- Every 10 minutes: Conducts a single-session benchmark for 60 segments.
Based on the system profiles provided, these benchmarks have minimal impact on overall performance. The Titan-node binary should not be the cause of the orchestrator failure. It's more likely that the orchestrator failed independently, leading to similar errors in the Titan binary, which utilizes go-livepeer
and livepeer_bench
under the hood.
There's a possibility that the psmisc
package missing in the Docker container prevented the effectiveness of the killall
commands in the titan-node binary. While it's a slim chance, lingering processes during benchmark retries could have contributed to issues.
System profiles
[!IMPORTANT]
These are rough estimates since I'm running a lot of other things at the same time.
Startup benchmark
Performed around 10:41.
CPU
GPU
Recuring benchmark
Performed around 10:44.
CPU
GPU
Hey @papabear99,
The error resurfaced after a smooth 48 hours. The good news is that I now have extended logs and manifest file names. Where are these manifest files stored? Is it on the system or the LivePeer server? I also added some system info graphs but couldn't find anything strange π€.
Quick heads up: The log time is an hour behind Grafana's.
Livepeer Logs
livepeer-combined-orchestrator | I1110 16:20:08.719560 1 block_watcher.go:454] Polling blocks from=149083488 to=149083507
livepeer-combined-orchestrator | I1110 16:20:11.764392 1 segment_rpc.go:94] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Received segment dur=1.999s
livepeer-combined-orchestrator | I1110 16:20:11.764454 1 census.go:1149] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentEmerged... duration=1.999
livepeer-combined-orchestrator | I1110 16:20:11.764774 1 orchestrator.go:387] Setting fixed price=303/1 for session=0be9db7d
livepeer-combined-orchestrator | I1110 16:20:11.764852 1 orchestrator.go:177] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Receiving ticket sessionID=0be9db7d faceValue=0.012 ETH winProb=0.0000833333 ev=1000000000000.00
livepeer-combined-orchestrator | I1110 16:20:12.505181 1 segment_rpc.go:134] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Downloaded segment dur=739.9242ms
livepeer-combined-orchestrator | I1110 16:20:12.505219 1 census.go:1211] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentDownloaded... dur=739.9242ms
livepeer-combined-orchestrator | I1110 16:20:12.507505 1 orchestrator.go:507] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting to transcode segment
livepeer-combined-orchestrator | I1110 16:20:12.507529 1 orchestrator.go:495] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.185.251 Creating new segment chan
livepeer-combined-orchestrator | I1110 16:20:12.507547 1 orchestrator.go:643] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting transcode segment loop for manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 sessionID=0be9db7d
livepeer-combined-orchestrator | I1110 16:20:12.507570 1 orchestrator.go:516] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Submitted segment to transcode loop
livepeer-combined-orchestrator | I1110 16:20:12.508639 1 lb.go:106] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Creating transcode session for job=0be9db7d
livepeer-combined-orchestrator | I1110 16:20:12.508726 1 lb.go:154] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Created transcode session for key=0be9db7d_0
livepeer-combined-orchestrator | I1110 16:20:12.508752 1 lb.go:240] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Transcode submitted for key=0be9db7d_0
livepeer-combined-orchestrator | [AVHWDeviceContext @ 0x7f93f0d4e500] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
livepeer-combined-orchestrator | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
livepeer-combined-orchestrator | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
livepeer-combined-orchestrator | E1110 16:20:12.567962 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | I1110 16:20:12.568006 1 lb.go:223] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=0be9db7d_0
livepeer-combined-orchestrator | I1110 16:20:12.568021 1 lb.go:146] manifestID=bbbm3u8_e4305ccf7a7c5f132950_0_0 seqNo=0 orchSessionID=0be9db7d clientIP=89.187.185.251 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=0be9db7d_0
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | goroutine 699501 [running]:
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc000582680, {0x89ffd0, 0xc0005a9a70}, {{0x8a66d8?, 0xc000720c80?}, {0x8a66d8?, 0xc000720c80?}}, 0xc000720c40, 0xc000ac6d10)
livepeer-combined-orchestrator | /src/core/orchestrator.go:588 +0xb50
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
livepeer-combined-orchestrator | /src/core/orchestrator.go:680 +0x9b
livepeer-combined-orchestrator | created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
livepeer-combined-orchestrator | /src/core/orchestrator.go:665 +0x47a
livepeer-combined-orchestrator exited with code 0
Titan-Node Logs
titan-node-pool | DEBUG:root:All transcoders are active
titan-node-pool | Results from background performance check: 0.07849 / max threshold 0.35 - Passed.
titan-node-pool | Uptime Rewards: Earning approximately 6300.0 nLPT
titan-node-pool | DEBUG:root:Updated settings from database
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 725
titan-node-pool | DEBUG:root:Checking for updates...
titan-node-pool | DEBUG:root:Checking if too many strikes
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 151
titan-node-pool | DEBUG:root:Current strikes on Mac Address: {'NY': '0', 'FRA': '0.0', 'LA': '0', 'LON': '0', 'SIN': '0', 'TOR': '0', 'SAO': '0', 'TOK': '0', 'JOH': '0', 'MUM': '0', 'SYD': '0'}
titan-node-pool | DEBUG:root:Checking if machine is blacklisted
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 32
titan-node-pool | DEBUG:root:Running background benchmark
titan-node-pool | DEBUG:root:Killing livepeer process
titan-node-pool | data [[Source File /tmp/_MEIbkTndv/bbb/source.m3u8] [Transcoding Options /tmp/_MEIbkTndv/transcodingOptions.json] [Concurrent Sessions 1] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1110 15:48:24.149328 7848 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEIbkTndv/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEIbkTndv/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 1 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7fb0888b3bc0] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1110 15:48:24.173332 7848 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 8 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000c85e58, 0xc000c85e80, {0xc000410000?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with Benchmarking - closing down
titan-node-pool | DEBUG:root:Killing livepeer process
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 15
titan-node-pool | DEBUG:root:Retrieving settings from database
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 725
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 320
titan-node-pool | DEBUG:root:List of pool node endpoints: ['FRA.titan-node-orch.com', 'JOH.titan-node-orch.com', 'LA.titan-node-orch.com', 'LON.titan-node-orch.com', 'MUM.titan-node-orch.com', 'NY.titan-node-orch.com', 'SAO.titan-node-orch.com', 'SIN.titan-node-orch.com', 'SYD.titan-node-orch.com', 'TOK.titan-node-orch.com', 'TOR.titan-node-orch.com']
titan-node-pool | DEBUG:root:Got local machine IP from http://v4.ident.me (option 1): 86.94.162.140
titan-node-pool | DEBUG:root:Checking if machine is blacklisted
titan-node-pool | Your GPU has run out of memory and needs to be patched to continue - Would you like to try patching automatically? (y/n): Error: EOF when reading a line
titan-node-pool | An error has occurred, restarting in 10 minutes...
titan-node-pool | Using ETH address: 0x2390f4d31cB118fB23De052754549F1f325497C1
titan-node-pool | Nickname: pools.transcode.eth
titan-node-pool | Using the following Nvidia cards: all
titan-node-pool | Titan Node Pool Version: 1.34
titan-node-pool | Join the Titan Node Pool Discord for help and support: https://discord.gg/FbB89GDgkC
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 32
titan-node-pool | DEBUG:root:Checking for updates...
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 151
titan-node-pool | Locating all nodes, please wait...
titan-node-pool | DEBUG:root:Current strikes on Mac Address: {'NY': '0', 'FRA': '0.0', 'LA': '0', 'LON': '0', 'SIN': '0', 'TOR': '0', 'SAO': '0', 'TOK': '0', 'JOH': '0', 'MUM': '0', 'SYD': '0'}
titan-node-pool | FRA.titan-node-orch.com = 24.66 milliseconds
titan-node-pool | JOH.titan-node-orch.com = 184.01 milliseconds
titan-node-pool | LA.titan-node-orch.com = 152.89 milliseconds
titan-node-pool | LON.titan-node-orch.com = 16.78 milliseconds
titan-node-pool | MUM.titan-node-orch.com = 187.46 milliseconds
titan-node-pool | NY.titan-node-orch.com = 95.55 milliseconds
titan-node-pool | SAO.titan-node-orch.com = 193.49 milliseconds
titan-node-pool | SIN.titan-node-orch.com = 177.74 milliseconds
titan-node-pool | SYD.titan-node-orch.com = 261.91 milliseconds
titan-node-pool | TOK.titan-node-orch.com = 419.99 milliseconds
titan-node-pool | TOR.titan-node-orch.com = 93.62 milliseconds
titan-node-pool | Selected Nodes: FRA.titan-node-orch.com LON.titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 22
titan-node-pool | Previous Internet Speed Check: Passed - Skipping
titan-node-pool | Max sessions set to 21
titan-node-pool | Checking driver patch status
titan-node-pool | data [[Source File /tmp/_MEIbkTndv/bbb/source.m3u8] [Transcoding Options /tmp/_MEIbkTndv/transcodingOptions.json] [Concurrent Sessions 21] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1110 15:58:36.324266 7866 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEIbkTndv/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEIbkTndv/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 21 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7fb2288b3bc0] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1110 15:58:36.339273 7866 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 11 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000d8be58, 0xc000d8be80, {0xc000410000?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with patch test - failed to launch
titan-node-pool | I1110 16:18:45.646010 2099 ot_rpc.go:147] Transcoding taskId=20444 url=https://fra.titan-node-orch.com:8935/stream/52593bfa/0.tempfile
titan-node-pool | I1110 16:18:45.646075 2099 os.go:88] manifestID=bbbm3u8_be259237490142334bcf_0_0 seqNo=0 orchSessionID=52593bfa taskId=20444 Downloading uri=https://fra.titan-node-orch.com:8935/stream/52593bfa/0.tempfile
titan-node-pool | I1110 16:18:45.809160 2099 os.go:106] manifestID=bbbm3u8_be259237490142334bcf_0_0 seqNo=0 orchSessionID=52593bfa taskId=20444 Downloaded uri=https://fra.titan-node-orch.com:8935/stream/52593bfa/0.tempfile dur=163.049794ms bytes=315276
titan-node-pool | I1110 16:18:45.810336 2099 ot_rpc.go:196] manifestID=bbbm3u8_be259237490142334bcf_0_0 seqNo=0 orchSessionID=52593bfa taskId=20444 Segment from taskId=20444 url=https://fra.titan-node-orch.com:8935/stream/52593bfa/0.tempfile saved to file=/root/.lpData/offchain/e148a198ff66e91701b0.tempfile
titan-node-pool | I1110 16:18:45.810391 2099 lb.go:99] manifestID=bbbm3u8_be259237490142334bcf_0_0 seqNo=0 orchSessionID=52593bfa taskId=20444 LB: Creating transcode session for job=52593bfa
titan-node-pool | I1110 16:18:45.810450 2099 lb.go:147] manifestID=bbbm3u8_be259237490142334bcf_0_0 seqNo=0 orchSessionID=52593bfa taskId=20444 LB: Created transcode session for key=52593bfa_0
titan-node-pool | I1110 16:18:45.810465 2099 lb.go:233] manifestID=bbbm3u8_be259237490142334bcf_0_0 seqNo=0 orchSessionID=52593bfa taskId=20444 LB: Transcode submitted for key=52593bfa_0
titan-node-pool | [AVHWDeviceContext @ 0x7fc67c02ba40] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1110 16:18:45.872030 2099 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
System info
Another crash occured:
LivePeer
livepeer-combined-orchestrator | I1111 14:19:32.301003 1 block_watcher.go:454] Polling blocks from=149370637 to=149370655
livepeer-combined-orchestrator | I1111 14:19:37.303156 1 block_watcher.go:454] Polling blocks from=149370656 to=149370674
livepeer-combined-orchestrator | I1111 14:19:41.449785 1 segment_rpc.go:94] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Received segment dur=1.999s
livepeer-combined-orchestrator | I1111 14:19:41.449834 1 census.go:1149] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentEmerged... duration=1.999
livepeer-combined-orchestrator | I1111 14:19:41.450198 1 orchestrator.go:387] Setting fixed price=303/1 for session=716616a3
livepeer-combined-orchestrator | I1111 14:19:41.450280 1 orchestrator.go:177] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Receiving ticket sessionID=716616a3 faceValue=0.012 ETH winProb=0.0000833333 ev=1000000000000.00
livepeer-combined-orchestrator | I1111 14:19:41.599207 1 segment_rpc.go:134] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Downloaded segment dur=148.401269ms
livepeer-combined-orchestrator | I1111 14:19:41.599232 1 census.go:1211] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentDownloaded... dur=148.401269ms
livepeer-combined-orchestrator | I1111 14:19:41.600883 1 orchestrator.go:507] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=212.102.38.92 Starting to transcode segment
livepeer-combined-orchestrator | I1111 14:19:41.600910 1 orchestrator.go:495] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Creating new segment chan
livepeer-combined-orchestrator | I1111 14:19:41.600927 1 orchestrator.go:643] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting transcode segment loop for manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 sessionID=716616a3
livepeer-combined-orchestrator | I1111 14:19:41.600964 1 orchestrator.go:516] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Submitted segment to transcode loop
livepeer-combined-orchestrator | I1111 14:19:41.602138 1 lb.go:106] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Creating transcode session for job=716616a3
livepeer-combined-orchestrator | I1111 14:19:41.602211 1 lb.go:154] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Created transcode session for key=716616a3_0
livepeer-combined-orchestrator | I1111 14:19:41.602232 1 lb.go:240] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Transcode submitted for key=716616a3_0
livepeer-combined-orchestrator | [AVHWDeviceContext @ 0x7fde85fb1940] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
livepeer-combined-orchestrator | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
livepeer-combined-orchestrator | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
livepeer-combined-orchestrator | E1111 14:19:41.659207 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | I1111 14:19:41.659258 1 lb.go:223] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=716616a3_0
livepeer-combined-orchestrator | I1111 14:19:41.659270 1 lb.go:146] manifestID=bbbm3u8_8309bb435a6d7913cf23_0_0 seqNo=0 orchSessionID=716616a3 clientIP=212.102.38.92 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=716616a3_0
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | goroutine 71280 [running]:
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc000642820, {0x89ffd0, 0xc00056bd70}, {{0x8a66d8?, 0xc00093d200?}, {0x8a66d8?, 0xc00093d200?}}, 0xc00093d1c0, 0xc000aa9810)
livepeer-combined-orchestrator | /src/core/orchestrator.go:588 +0xb50
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
livepeer-combined-orchestrator | /src/core/orchestrator.go:680 +0x9b
livepeer-combined-orchestrator | created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
livepeer-combined-orchestrator | /src/core/orchestrator.go:665 +0x47a
livepeer-combined-orchestrator exited with code 0
Titan-Node
titan-node-pool | DEBUG:root:Killing livepeer process
titan-node-pool | DEBUG:root:Killing livepeer process
titan-node-pool | data [[Source File /tmp/_MEITRuvmF/bbb/source.m3u8] [Transcoding Options /tmp/_MEITRuvmF/transcodingOptions.json] [Concurrent Sessions 1] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1111 13:22:03.726094 4036 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEITRuvmF/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEITRuvmF/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 1 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7f23b48b3bc0] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1111 13:22:03.741182 4036 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 6 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000cc7e58, 0xc000cc7e80, {0xc0001f2900?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with Benchmarking - closing down
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 15
titan-node-pool | DEBUG:root:Retrieving settings from database
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 725
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 320
titan-node-pool | DEBUG:root:List of pool node endpoints: ['FRA.titan-node-orch.com', 'JOH.titan-node-orch.com', 'LA.titan-node-orch.com', 'LON.titan-node-orch.com', 'MUM.titan-node-orch.com', 'NY.titan-node-orch.com', 'SAO.titan-node-orch.com', 'SIN.titan-node-orch.com', 'SYD.titan-node-orch.com', 'TOK.titan-node-orch.com', 'TOR.titan-node-orch.com']
titan-node-pool | DEBUG:root:Got local machine IP from http://v4.ident.me (option 1): 86.94.162.140
titan-node-pool | Your GPU has run out of memory and needs to be patched to continue - Would you like to try patching automatically? (y/n): Error: EOF when reading a line
titan-node-pool | An error has occurred, restarting in 10 minutes...
titan-node-pool | Using ETH address: 0x2390f4d31cB118fB23De052754549F1f325497C1
titan-node-pool | Nickname: pools.transcode.eth
titan-node-pool | Using the following Nvidia cards: all
titan-node-pool | Titan Node Pool Version: 1.34
titan-node-pool | Join the Titan Node Pool Discord for help and support: https://discord.gg/FbB89GDgkC
titan-node-pool | DEBUG:root:Checking if machine is blacklisted
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 32
titan-node-pool | DEBUG:root:Checking for updates...
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 151
titan-node-pool | DEBUG:root:Current strikes on Mac Address: {'NY': '0', 'FRA': '0.0', 'LA': '0', 'LON': '0', 'SIN': '0', 'TOR': '0', 'SAO': '0', 'TOK': '0', 'JOH': '0', 'MUM': '0', 'SYD': '0'}
titan-node-pool | Locating all nodes, please wait...
titan-node-pool | FRA.titan-node-orch.com = 21.73 milliseconds
titan-node-pool | JOH.titan-node-orch.com = 177.33 milliseconds
titan-node-pool | LA.titan-node-orch.com = 153.51 milliseconds
titan-node-pool | LON.titan-node-orch.com = 16.76 milliseconds
titan-node-pool | MUM.titan-node-orch.com = 176.9 milliseconds
titan-node-pool | NY.titan-node-orch.com = 95.79 milliseconds
titan-node-pool | SAO.titan-node-orch.com = 193.93 milliseconds
titan-node-pool | SIN.titan-node-orch.com = 181.09 milliseconds
titan-node-pool | SYD.titan-node-orch.com = 261.57 milliseconds
titan-node-pool | TOK.titan-node-orch.com = 797.65 milliseconds
titan-node-pool | TOR.titan-node-orch.com = 93.52 milliseconds
titan-node-pool | Selected Nodes: FRA.titan-node-orch.com LON.titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 22
titan-node-pool | Previous Internet Speed Check: Passed - Skipping
titan-node-pool | Max sessions set to 21
titan-node-pool | Checking driver patch status
titan-node-pool | data [[Source File /tmp/_MEITRuvmF/bbb/source.m3u8] [Transcoding Options /tmp/_MEITRuvmF/transcodingOptions.json] [Concurrent Sessions 21] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1111 13:32:17.201116 4055 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEITRuvmF/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEITRuvmF/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 21 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7fa7ec848440] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1111 13:32:17.216462 4055 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 36 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000c71e58, 0xc000c71e80, {0xc000410000?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with patch test - failed to launch
All the Livepeer streams you've posted are ~20 mins after the hour, the same time you receive test streams from Livepeer. If Titan's test is testing your Ts ability to transcode 21 sessions at the same time you have another job running that seems like it might be the issue.
I would either lower your maxSessions for the pool (very unlikely to receive anywhere near 21 sessions) or try running just the O without the pool software running for a few days and see if the issue persists.
Because all the logs posted are from Livepeer test streams it's not likely the session profile causing an issue as I previously suspected.
All the Livepeer streams you've posted are ~20 mins after the hour, the same time you receive test streams from Livepeer. If Titan's test is testing your Ts ability to transcode 21 sessions at the same time you have another job running that seems like it might be the issue.
I would either lower your maxSessions for the pool (very unlikely to receive anywhere near 21 sessions) or try running just the O without the pool software running for a few days and see if the issue persists.
Because all the logs posted are from Livepeer test streams it's not likely the session profile causing an issue as I previously suspected.
@papabear99 I Appreciate your help in troubleshooting the issue together! Your theory makes sense, especially considering the sporadic nature of the error. I'll start by adjusting the Titan sessions to 16 and monitor if the problem persists. If it does, my next step will be running a week without the pool software. Thanks again! π
@papabear99, I reencountered the error when configuring Titan's pool to 18 despite having a maximum of 21 available sessions. This occurred when Livepeer had only one session active, providing ample headspace for the benchmark. To troubleshoot, I'll run my system without Titan's binary for a few days and monitor the results. I attached the logs below in case you see something interesting that could help us debug the problem.
LivePeer Logs
livepeer-combined-orchestrator | I1114 20:27:38.616268 1 census.go:1149] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentEmerged... duration=1.02
livepeer-combined-orchestrator | I1114 20:27:38.616524 1 orchestrator.go:387] Setting fixed price=303/1 for session=da4a5022
livepeer-combined-orchestrator | I1114 20:27:38.616574 1 orchestrator.go:177] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=195.181.174.186 Receiving ticket sessionID=da4a5022 faceValue=0.012 ETH winProb=0.0000833333 ev=1000000000000.00
livepeer-combined-orchestrator | I1114 20:27:38.792521 1 segment_rpc.go:134] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Downloaded segment dur=175.71138ms
livepeer-combined-orchestrator | I1114 20:27:38.792565 1 census.go:1211] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentDownloaded... dur=175.71138ms
livepeer-combined-orchestrator | I1114 20:27:38.846546 1 orchestrator.go:507] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting to transcode segment
livepeer-combined-orchestrator | I1114 20:27:38.846558 1 orchestrator.go:495] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Creating new segment chan
livepeer-combined-orchestrator | I1114 20:27:38.846566 1 orchestrator.go:643] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting transcode segment loop for manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 sessionID=da4a5022
livepeer-combined-orchestrator | I1114 20:27:38.846576 1 orchestrator.go:516] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Submitted segment to transcode loop
livepeer-combined-orchestrator | I1114 20:27:38.854340 1 lb.go:106] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Creating transcode session for job=da4a5022
livepeer-combined-orchestrator | I1114 20:27:38.854392 1 lb.go:154] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Created transcode session for key=da4a5022_0
livepeer-combined-orchestrator | I1114 20:27:38.854404 1 lb.go:240] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Transcode submitted for key=da4a5022_0
livepeer-combined-orchestrator | [AVHWDeviceContext @ 0x7f4e20b93140] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
livepeer-combined-orchestrator | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
livepeer-combined-orchestrator | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
livepeer-combined-orchestrator | E1114 20:27:38.887287 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | I1114 20:27:38.887369 1 lb.go:223] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=da4a5022_0
livepeer-combined-orchestrator | I1114 20:27:38.887383 1 lb.go:146] manifestID=4a8d5e04-1ac5-4331-bf59-b97bd5bdf025 seqNo=13 orchSessionID=da4a5022 clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=da4a5022_0
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | goroutine 74019 [running]:
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc00064a9c0, {0x89ffd0, 0xc000452c00}, {{0x8a66d8?, 0xc0123ec100?}, {0x8a66d8?, 0xc0123ec100?}}, 0xc0123ec0c0, 0xc0151c8bb0)
livepeer-combined-orchestrator | /src/core/orchestrator.go:588 +0xb50
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
livepeer-combined-orchestrator | /src/core/orchestrator.go:680 +0x9b
livepeer-combined-orchestrator | created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
livepeer-combined-orchestrator | /src/core/orchestrator.go:665 +0x47a
Titan Binary Logs
titan-node-pool | DEBUG:root:Running background benchmark
titan-node-pool | DEBUG:root:Killing livepeer process
titan-node-pool | data [[Source File /tmp/_MEIt9vqH9/bbb/source.m3u8] [Transcoding Options /tmp/_MEIt9vqH9/transcodingOptions.json] [Concurrent Sessions 1] [Live Mode true] [MPEG-7 Sign Mode false] [Nvidia GPU IDs 0]]
titan-node-pool | timestamp,session,segment,seg_dur,transcode_time,frames
titan-node-pool | in.Device 0
titan-node-pool |
titan-node-pool | I1114 20:25:28.733113 4097 livepeer_bench.go:88] log level is: 24
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | | Source File | /tmp/_MEIt9vqH9/bbb/source.m3u8 |
titan-node-pool | | Transcoding Options | /tmp/_MEIt9vqH9/transcodingOptions.json |
titan-node-pool | | Concurrent Sessions | 1 |
titan-node-pool | | Live Mode | true |
titan-node-pool | | MPEG-7 Sign Mode | false |
titan-node-pool | | Nvidia GPU IDs | 0 |
titan-node-pool | *---------------------*-----------------------------------------*
titan-node-pool | [AVHWDeviceContext @ 0x7fd8a48b3bc0] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
titan-node-pool | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
titan-node-pool | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
titan-node-pool | E1114 20:25:28.757496 4097 ffmpeg.go:990] Transcoder Return : Unrecoverable state, restart process
titan-node-pool | panic: Unrecoverable state, restart process
titan-node-pool |
titan-node-pool | goroutine 15 [running]:
titan-node-pool | github.com/livepeer/lpms/ffmpeg.(*Transcoder).Transcode(0xc000c67e58, 0xc000c67e80, {0xc0000ec900?, 0x4, 0x4})
titan-node-pool | /github/home/go/pkg/mod/github.com/livepeer/[email protected]/ffmpeg/ffmpeg.go:993 +0xe5d
titan-node-pool | main.main.func1(0x0, 0x0?)
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:225 +0x6a7
titan-node-pool | created by main.main
titan-node-pool | /__w/go-livepeer/go-livepeer/cmd/livepeer_bench/livepeer_bench.go:165 +0x2274
titan-node-pool |
titan-node-pool | Error with Benchmarking - closing down
titan-node-pool | DEBUG:root:Killing livepeer process
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | /bin/sh: 1: killall: not found
titan-node-pool | DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): titan-node-orch.com
titan-node-pool | DEBUG:urllib3.connectionpool:https://titan-node-orch.com:5010 "POST /query HTTP/1.1" 200 15
@papabear99, I agree with your assessment of the benchmarks in @titan-node's binary maxing out my GPU when I have other sessions on my main orchestrator. My issue, therefore, is different from the one experienced by you and @AuthorityNull. Since turning off the titan-node binary, I haven't encountered any errors, maintaining a mean session count of 1.2 over the past few days. Interestingly, there's no direct 1-1 correlation between my GPU's max number of streams (21) and the total number of sessions plus test streams. The titan-node binary was set to 18 in the latest crash incident, but I only had two sessions then. However, I think the pool also requested sessions during the crash, maxing out the available 21 sessions.
I'll inform @titan-node about the issues with the benchmark, suggesting he checks the livepeer_current_sessions_total
Prometheus metric and subtract that from the benchmark session count. Hopefully, that adjustment will resolve the problem.
I caught a mistake in my previous messageβI mixed up some details. With its single test stream running for 60 seconds, Titans benchmark probably isn't the culprit for the issues. However, the problem may arise from the concurrent operation of the titan-node binary and the go-live peer docker container. When high-resolution streams kick in, the GPU memory maxes out rapidly. The quick pace of this occurrence might be why it doesn't immediately appear on the Grafana dashboard. If it keeps happening I will run a more detailed GPU monitoring tool.
Here is another update for the thread. It looks like @Titan-Node's binary is not involved in the crash. After seven days of running without problems, the orchestrator crashed again:
livepeer-combined-orchestrator | I1120 22:37:23.227303 1 player.go:105] LPMS got HTTP request @ /stream/3ed0f9cd/480p/7016.ts
livepeer-combined-orchestrator | I1120 22:37:23.241308 1 player.go:105] LPMS got HTTP request @ /stream/3ed0f9cd/720p/7016.ts
livepeer-combined-orchestrator | I1120 22:37:24.851365 1 segment_rpc.go:94] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Received segment dur=2s
livepeer-combined-orchestrator | I1120 22:37:24.851409 1 census.go:1149] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentEmerged... duration=2
livepeer-combined-orchestrator | I1120 22:37:24.852366 1 orchestrator.go:219] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Payment tickets processed sessionID=3ed0f9cd faceValue=0.036 ETH winProb=0.0000020000 ev=24000000000.00
livepeer-combined-orchestrator | I1120 22:37:24.884558 1 segment_rpc.go:134] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Downloaded segment dur=32.065876ms
livepeer-combined-orchestrator | I1120 22:37:24.884593 1 census.go:1211] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentDownloaded... dur=32.065876ms
livepeer-combined-orchestrator | I1120 22:37:24.886366 1 orchestrator.go:511] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting to transcode segment
livepeer-combined-orchestrator | I1120 22:37:24.886400 1 orchestrator.go:520] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Submitted segment to transcode loop
livepeer-combined-orchestrator | I1120 22:37:24.887331 1 lb.go:74] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Using existing transcode session for key=3ed0f9cd_0
livepeer-combined-orchestrator | I1120 22:37:24.887366 1 lb.go:240] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=143.244.61.193 LB: Transcode submitted for key=3ed0f9cd_0
livepeer-combined-orchestrator | I1120 22:37:24.997694 1 segment_rpc.go:94] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.177.138 Received segment dur=2s
livepeer-combined-orchestrator | I1120 22:37:24.997715 1 census.go:1149] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentEmerged... duration=2
livepeer-combined-orchestrator | I1120 22:37:24.997912 1 orchestrator.go:391] Setting fixed price=303/1 for session=09a233a9
livepeer-combined-orchestrator | I1120 22:37:24.998148 1 orchestrator.go:219] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Payment tickets processed sessionID=09a233a9 faceValue=0.012 ETH winProb=0.0000006667 ev=8000000000.00
livepeer-combined-orchestrator | I1120 22:37:25.025451 1 segment_rpc.go:134] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Downloaded segment dur=27.191262ms
livepeer-combined-orchestrator | I1120 22:37:25.025467 1 census.go:1211] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentDownloaded... dur=27.191262ms
livepeer-combined-orchestrator | I1120 22:37:25.027335 1 orchestrator.go:511] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting to transcode segment
livepeer-combined-orchestrator | I1120 22:37:25.027347 1 orchestrator.go:499] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Creating new segment chan
livepeer-combined-orchestrator | I1120 22:37:25.027359 1 orchestrator.go:647] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting transcode segment loop for manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd sessionID=09a233a9
livepeer-combined-orchestrator | I1120 22:37:25.027375 1 orchestrator.go:520] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Submitted segment to transcode loop
livepeer-combined-orchestrator | I1120 22:37:25.028022 1 lb.go:106] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.177.138 LB: Creating transcode session for job=09a233a9
livepeer-combined-orchestrator | I1120 22:37:25.028063 1 lb.go:154] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Created transcode session for key=09a233a9_0
livepeer-combined-orchestrator | I1120 22:37:25.028076 1 lb.go:240] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Transcode submitted for key=09a233a9_0
livepeer-combined-orchestrator | I1120 22:37:25.033844 1 orchestrator.go:606] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=143.244.61.193 Transcoding of segment took=146.514657ms
livepeer-combined-orchestrator | I1120 22:37:25.033868 1 census.go:1334] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=143.244.61.193 Logging SegmentTranscode nonce=0 seqNo=7017 dur=146.514657ms trusted=true verified=true
livepeer-combined-orchestrator | I1120 22:37:25.033892 1 orchestrator.go:625] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Transcoded segment profile=480p bytes=75012
livepeer-combined-orchestrator | I1120 22:37:25.034226 1 orchestrator.go:625] manifestID=33827ae9-d223-41e5-86a1-7fc5d18ffd46 seqNo=7017 orchSessionID=3ed0f9cd clientIP=143.244.61.193 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Transcoded segment profile=720p bytes=259816
livepeer-combined-orchestrator | [AVHWDeviceContext @ 0x7f2c54b148c0] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
livepeer-combined-orchestrator | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
livepeer-combined-orchestrator | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
livepeer-combined-orchestrator | E1120 22:37:25.090606 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | I1120 22:37:25.090657 1 lb.go:223] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=09a233a9_0
livepeer-combined-orchestrator | I1120 22:37:25.090672 1 lb.go:146] manifestID=f62effff-0ea9-4402-b7ee-47c2b4f5bfdd seqNo=811 orchSessionID=09a233a9 clientIP=89.187.177.138 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=09a233a9_0
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | goroutine 3635167 [running]:
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc0000e8d00, {0x8a0110, 0xc00436f0b0}, {{0x8a6818?, 0xc001e08640?}, {0x8a6818?, 0xc001e08640?}}, 0xc001e08600, 0xc0098cc420)
livepeer-combined-orchestrator | /src/core/orchestrator.go:592 +0xb50
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
livepeer-combined-orchestrator | /src/core/orchestrator.go:684 +0x9b
livepeer-combined-orchestrator | created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
livepeer-combined-orchestrator | /src/core/orchestrator.go:669 +0x47a
livepeer-combined-orchestrator exited with code 0
System info
The crash occurred at 22:37.
I created a post in the Livepeer developer discord channel to ask how I can best debug this (i.e. https://discord.com/channels/423160867534929930/1176444392635121685).
I had another crash today. Since it looks like a permission problem because of a missing mount point, I will try with the privileged
flag for a while.
See crash logs
livepeer-combined-orchestrator | I1122 09:21:28.839133 1 segment_rpc.go:94] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Received segment dur=1.999s
livepeer-combined-orchestrator | I1122 09:21:28.839189 1 census.go:1149] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentEmerged... duration=1.999
livepeer-combined-orchestrator | I1122 09:21:28.839551 1 orchestrator.go:391] Setting fixed price=303/1 for session=bef1902c
livepeer-combined-orchestrator | I1122 09:21:28.840798 1 orchestrator.go:219] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Payment tickets processed sessionID=bef1902c faceValue=0.06 ETH winProb=0.0000033333 ev=40000000000.00
livepeer-combined-orchestrator | I1122 09:21:28.898291 1 segment_rpc.go:134] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Downloaded segment dur=57.346216ms
livepeer-combined-orchestrator | I1122 09:21:28.898323 1 census.go:1211] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Logging SegmentDownloaded... dur=57.346216ms
livepeer-combined-orchestrator | I1122 09:21:28.900615 1 orchestrator.go:511] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting to transcode segment
livepeer-combined-orchestrator | I1122 09:21:28.900642 1 orchestrator.go:499] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Creating new segment chan
livepeer-combined-orchestrator | I1122 09:21:28.900661 1 orchestrator.go:647] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Starting transcode segment loop for manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 sessionID=bef1902c
livepeer-combined-orchestrator | I1122 09:21:28.900689 1 orchestrator.go:520] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Submitted segment to transcode loop
livepeer-combined-orchestrator | I1122 09:21:28.901846 1 lb.go:106] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Creating transcode session for job=bef1902c
livepeer-combined-orchestrator | I1122 09:21:28.901921 1 lb.go:154] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=195.181.174.186 LB: Created transcode session for key=bef1902c_0
livepeer-combined-orchestrator | I1122 09:21:28.901940 1 lb.go:240] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Transcode submitted for key=bef1902c_0
livepeer-combined-orchestrator | [AVHWDeviceContext @ 0x7eff58c88700] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
livepeer-combined-orchestrator | ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
livepeer-combined-orchestrator | ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
livepeer-combined-orchestrator | E1122 09:21:28.958632 1 ffmpeg.go:1012] Transcoder Return : Unrecoverable state, restart process
livepeer-combined-orchestrator | I1122 09:21:28.958704 1 lb.go:223] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=bef1902c_0
livepeer-combined-orchestrator | I1122 09:21:28.958730 1 lb.go:146] manifestID=bbbm3u8_6f9c5c2d3941a84b51ac_0_0 seqNo=0 orchSessionID=bef1902c clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=bef1902c_0
livepeer-combined-orchestrator | panic: Unrecoverable state, restart process
livepeer-combined-orchestrator |
livepeer-combined-orchestrator | goroutine 514840 [running]:
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc000642820, {0x8a0110, 0xc002bb6a20}, {{0x8a6818?, 0xc0006d6540?}, {0x8a6818?, 0xc0006d6540?}}, 0xc0006d6500, 0xc018bdeb00)
livepeer-combined-orchestrator | /src/core/orchestrator.go:592 +0xb50
livepeer-combined-orchestrator | github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
livepeer-combined-orchestrator | /src/core/orchestrator.go:684 +0x9b
livepeer-combined-orchestrator | created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
livepeer-combined-orchestrator | /src/core/orchestrator.go:669 +0x47a
livepeer-combined-orchestrator exited with code 0
It appears that applying the privileged
flag resolves the issue. As a result, the problem is thrown because not all necessary points are mounted in my container. I'd greatly appreciate the guidance if anyone knows the specific mount points for running the container without the privileged
setting ππ». For now, I am content with the privileged
workaround, but I might research this later.