[BUG] Possible regression `pipeline frozen`
Describe the bug
We've observed using 1.9.0 a recording with pipeline frozen in our deployment by recording a room of 1.5 hour of duration.
Maybe it is related with this:
- https://github.com/livekit/egress/issues/158
- https://github.com/livekit/egress/pull/713
Egress Version v1.9.0
Egress Request
{
"nodeID": "NE_dBp3vF6dmhrm",
"clusterID": "",
"egressID": "EG_XyqjWAb54HUa",
"requestType": "room_composite",
"outputType": "file",
"room": "martha-peru-copper-hedgehog",
"request": {
"RoomComposite": {
"room_name": "martha-peru-copper-hedgehog",
"layout": "speaker",
"Output": {
"File": {
"filepath": "openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781",
"disable_manifest": true,
"Output": null
}
},
"Options": null,
"file_outputs": [
{
"filepath": "openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781",
"disable_manifest": true,
"Output": null
}
]
}
}
}
Additional context The machine was quite modest. 2 CPU and 4 GB of ram. It was a session with 6 participants and a screenshare from one of the participants.
I've observed during the session that the CPU was too high. May the pipeline frozen error appear on those situations?
To be able to run the recording with low CPU we have the following egress config:
redis:
address: 127.0.0.1:7000
username: ""
password: xxxxxx
db: 0
use_tls: false
api_key: xxxxxxx
api_secret: xxxxxxx
ws_url: ws://127.0.0.1:7880
health_port: 9093
backup:
prefix: /home/egress/backup_storage
# Storage for recordings
storage:
s3:
access_key: xxxxxx
secret: xxxxxx
# Default region for minio
region: us-east-1
endpoint: http://127.0.0.1:9100
bucket: app-data
force_path_style: true
# ---------------
# This allow us to run recordings with small machines
# ---------------
cpu_cost:
max_cpu_utilization: 0.80
room_composite_cpu_cost: 0.01
audio_room_composite_cpu_cost: 0.01
web_cpu_cost: 0.01
audio_web_cpu_cost: 0.01
participant_cpu_cost: 0.01
track_composite_cpu_cost: 0.01
track_cpu_cost: 0.01
Logs
2025-03-27T20:12:43.016Z INFO egress server/server.go:148 service ready {"nodeID": "NE_dBp3vF6dmhrm", "clusterID": ""}
2025-04-01T09:23:10.741Z INFO egress server/server_rpc.go:58 request received {"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:10.791Z INFO egress server/server_rpc.go:68 request validated {"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "room": "martha-peru-copper-hedgehog", "request": {"RoomComposite":{"room_name":"martha-peru-copper-hedgehog","layout":"speaker","Output":{"File":{"filepath":"openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781","disable_manifest":true,"Output":null}},"Options":null,"file_outputs":[{"filepath":"openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781","disable_manifest":true,"Output":null}]}}}
2025-04-01T09:23:11.503Z INFO egress redis/redis.go:142 connecting to redis {"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "simple": true, "addr": "127.0.0.1:7000"}
2025-04-01T09:23:16.304Z INFO egress source/web.go:150 xvfb: The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning: Could not resolve keysym XF86CameraAccessEnable
> Warning: Could not resolve keysym XF86CameraAccessDisable
> Warning: Could not resolve keysym XF86CameraAccessToggle
> Warning: Could not resolve keysym XF86NextElement
{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:16.308Z INFO egress source/web.go:150 xvfb: > Warning: Could not resolve keysym XF86PreviousElement
> Warning: Could not resolve keysym XF86AutopilotEngageToggle
> Warning: Could not resolve keysym XF86MarkWaypoint
> Warning: Could not resolve keysym XF86Sos
> Warning: Could not resolve keysym XF86NavChart
> Warning: Could not resolve keysym XF86FishingChart
> Warning: Could not resolve keysym XF86SingleRangeRadar
> Warning: Could not resolve keysym XF86DualRangeRadar
> Warning: Could not resolve keysym XF86RadarOverlay
> Warning: Could not resolve keysym XF86TraditionalSonar
> Warning: Could not resolve keysym XF86ClearvuSonar
> Warning: Could not resolve keysym XF86SidevuSonar
> Warning: Could not resolve keysym XF86NavInfo
Errors from xkbcomp are not fatal to the X server
{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
0:00:09.728309404 32 0x648e43eaf200 WARN cudaloader gstcudaloader.c:169:gst_cuda_load_library: Could not open library libcuda.so.1, libcuda.so.1: cannot open shared object file: No such file or directory
0:00:09.728378962 32 0x648e43eaf200 WARN nvcodec plugin.c:94:plugin_init: Failed to load cuda library
2025-04-01T09:23:23.958Z INFO egress source/web.go:288 chrome: START_RECORDING {"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:24.030Z INFO egress source/web.go:288 chrome: START_RECORDING {"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:24.171Z INFO egress source/web.go:288 chrome: START_RECORDING {"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:24.180Z INFO egress source/web.go:288 chrome: START_RECORDING {"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
0:00:13.926125233 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x120000: 'AVR (Audio Visual Research)' is not mapped
0:00:13.930651791 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x180000: 'CAF (Apple Core Audio File)' is not mapped
0:00:13.930675246 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x100000: 'HTK (HMM Tool Kit)' is not mapped
0:00:13.930719414 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0xc0000: 'MAT4 (GNU Octave 2.0 / Matlab 4.2)' is not mapped
0:00:13.930747707 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0xd0000: 'MAT5 (GNU Octave 2.1 / Matlab 5.0)' is not mapped
0:00:13.930763931 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x210000: 'MPC (Akai MPC 2k)' is not mapped
0:00:13.930778439 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x230000: 'MPEG-1/2 Audio' is not mapped
0:00:13.930806762 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0xe0000: 'PVF (Portable Voice Format)' is not mapped
0:00:13.930825741 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x160000: 'SD2 (Sound Designer II)' is not mapped
0:00:13.930874205 32 0x648e43eaf200 WARN default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x190000: 'WVE (Psion Series 3)' is not mapped
2025-04-01T09:23:30.583Z INFO egress pipeline/watch.go:252 pipeline playing {"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:30.627Z INFO egress info/io.go:178 egress_active {"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "error": "", "code": 0, "details": ""}
2025-04-01T11:37:24.160Z INFO egress info/io.go:178 egress_ending {"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "error": "", "code": 0, "details": "End reason: StopEgress API"}
2025-04-01T11:37:54.933Z INFO egress source/web.go:150 xvfb: The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning: Could not resolve keysym XF86CameraAccessEnable
> Warning: Could not resolve keysym XF86CameraAccessDisable
> Warning: Could not resolve keysym XF86CameraAccessToggle
{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T11:37:54.937Z INFO egress source/web.go:150 xvfb: > Warning: Could not resolve keysym XF86NextElement
> Warning: Could not resolve keysym XF86PreviousElement
> Warning: Could not resolve keysym XF86AutopilotEngageToggle
> Warning: Could not resolve keysym XF86MarkWaypoint
> Warning: Could not resolve keysym XF86Sos
> Warning: Could not resolve keysym XF86NavChart
> Warning: Could not resolve keysym XF86FishingChart
> Warning: Could not resolve keysym XF86SingleRangeRadar
> Warning: Could not resolve keysym XF86DualRangeRadar
> Warning: Could not resolve keysym XF86RadarOverlay
> Warning: Could not resolve keysym XF86TraditionalSonar
> Warning: Could not resolve keysym XF86ClearvuSonar
> Warning: Could not resolve keysym XF86SidevuSonar
> Warning: Could not resolve keysym XF86NavInfo
Errors from xkbcomp are not fatal to the X server
{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T11:38:11.235Z INFO egress info/io.go:178 egress_failed {"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "error": "pipeline frozen", "code": 500, "details": "End reason: StopEgress API"}
There was no recording running prior to this one. Could the available CPU potentially increment the possibility of a recording with a pipeline frozen error ?