egress icon indicating copy to clipboard operation
egress copied to clipboard

[BUG] Possible regression `pipeline frozen`

Open cruizba opened this issue 9 months ago • 0 comments

Describe the bug We've observed using 1.9.0 a recording with pipeline frozen in our deployment by recording a room of 1.5 hour of duration.

Maybe it is related with this:

  • https://github.com/livekit/egress/issues/158
  • https://github.com/livekit/egress/pull/713

Egress Version v1.9.0

Egress Request

{
  "nodeID": "NE_dBp3vF6dmhrm",
  "clusterID": "",
  "egressID": "EG_XyqjWAb54HUa",
  "requestType": "room_composite",
  "outputType": "file",
  "room": "martha-peru-copper-hedgehog",
  "request": {
    "RoomComposite": {
      "room_name": "martha-peru-copper-hedgehog",
      "layout": "speaker",
      "Output": {
        "File": {
          "filepath": "openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781",
          "disable_manifest": true,
          "Output": null
        }
      },
      "Options": null,
      "file_outputs": [
        {
          "filepath": "openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781",
          "disable_manifest": true,
          "Output": null
        }
      ]
    }
  }
}

Additional context The machine was quite modest. 2 CPU and 4 GB of ram. It was a session with 6 participants and a screenshare from one of the participants.

I've observed during the session that the CPU was too high. May the pipeline frozen error appear on those situations?

To be able to run the recording with low CPU we have the following egress config:

redis:
    address: 127.0.0.1:7000
    username: ""
    password: xxxxxx
    db: 0
    use_tls: false
api_key: xxxxxxx
api_secret: xxxxxxx
ws_url: ws://127.0.0.1:7880
health_port: 9093

backup:
    prefix: /home/egress/backup_storage

# Storage for recordings
storage:
    s3:
        access_key: xxxxxx
        secret: xxxxxx
        # Default region for minio
        region: us-east-1
        endpoint: http://127.0.0.1:9100
        bucket: app-data
        force_path_style: true

# ---------------
# This allow us to run recordings with small machines
# ---------------
cpu_cost:
    max_cpu_utilization: 0.80
    room_composite_cpu_cost: 0.01
    audio_room_composite_cpu_cost: 0.01
    web_cpu_cost: 0.01
    audio_web_cpu_cost: 0.01
    participant_cpu_cost: 0.01
    track_composite_cpu_cost: 0.01
    track_cpu_cost: 0.01

Logs

2025-03-27T20:12:43.016Z	INFO	egress	server/server.go:148	service ready	{"nodeID": "NE_dBp3vF6dmhrm", "clusterID": ""}
2025-04-01T09:23:10.741Z	INFO	egress	server/server_rpc.go:58	request received	{"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:10.791Z	INFO	egress	server/server_rpc.go:68	request validated	{"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "room": "martha-peru-copper-hedgehog", "request": {"RoomComposite":{"room_name":"martha-peru-copper-hedgehog","layout":"speaker","Output":{"File":{"filepath":"openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781","disable_manifest":true,"Output":null}},"Options":null,"file_outputs":[{"filepath":"openvidu-call/recordings/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy/martha-peru-copper-hedgehog-RM_HQmWFfgxY4Wy-1743499389781","disable_manifest":true,"Output":null}]}}}
2025-04-01T09:23:11.503Z	INFO	egress	redis/redis.go:142	connecting to redis	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "simple": true, "addr": "127.0.0.1:7000"}
2025-04-01T09:23:16.304Z	INFO	egress	source/web.go:150	xvfb: The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Could not resolve keysym XF86CameraAccessEnable
> Warning:          Could not resolve keysym XF86CameraAccessDisable
> Warning:          Could not resolve keysym XF86CameraAccessToggle
> Warning:          Could not resolve keysym XF86NextElement
	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:16.308Z	INFO	egress	source/web.go:150	xvfb: > Warning:          Could not resolve keysym XF86PreviousElement
> Warning:          Could not resolve keysym XF86AutopilotEngageToggle
> Warning:          Could not resolve keysym XF86MarkWaypoint
> Warning:          Could not resolve keysym XF86Sos
> Warning:          Could not resolve keysym XF86NavChart
> Warning:          Could not resolve keysym XF86FishingChart
> Warning:          Could not resolve keysym XF86SingleRangeRadar
> Warning:          Could not resolve keysym XF86DualRangeRadar
> Warning:          Could not resolve keysym XF86RadarOverlay
> Warning:          Could not resolve keysym XF86TraditionalSonar
> Warning:          Could not resolve keysym XF86ClearvuSonar
> Warning:          Could not resolve keysym XF86SidevuSonar
> Warning:          Could not resolve keysym XF86NavInfo
Errors from xkbcomp are not fatal to the X server
	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
0:00:09.728309404    32 0x648e43eaf200 WARN              cudaloader gstcudaloader.c:169:gst_cuda_load_library: Could not open library libcuda.so.1, libcuda.so.1: cannot open shared object file: No such file or directory
0:00:09.728378962    32 0x648e43eaf200 WARN                 nvcodec plugin.c:94:plugin_init: Failed to load cuda library
2025-04-01T09:23:23.958Z	INFO	egress	source/web.go:288	chrome: START_RECORDING	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:24.030Z	INFO	egress	source/web.go:288	chrome: START_RECORDING	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:24.171Z	INFO	egress	source/web.go:288	chrome: START_RECORDING	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:24.180Z	INFO	egress	source/web.go:288	chrome: START_RECORDING	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
0:00:13.926125233    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x120000: 'AVR (Audio Visual Research)' is not mapped
0:00:13.930651791    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x180000: 'CAF (Apple Core Audio File)' is not mapped
0:00:13.930675246    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x100000: 'HTK (HMM Tool Kit)' is not mapped
0:00:13.930719414    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0xc0000: 'MAT4 (GNU Octave 2.0 / Matlab 4.2)' is not mapped
0:00:13.930747707    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0xd0000: 'MAT5 (GNU Octave 2.1 / Matlab 5.0)' is not mapped
0:00:13.930763931    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x210000: 'MPC (Akai MPC 2k)' is not mapped
0:00:13.930778439    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x230000: 'MPEG-1/2 Audio' is not mapped
0:00:13.930806762    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0xe0000: 'PVF (Portable Voice Format)' is not mapped
0:00:13.930825741    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x160000: 'SD2 (Sound Designer II)' is not mapped
0:00:13.930874205    32 0x648e43eaf200 WARN                 default gstsfelement.c:97:gst_sf_create_audio_template_caps: format 0x190000: 'WVE (Psion Series 3)' is not mapped
2025-04-01T09:23:30.583Z	INFO	egress	pipeline/watch.go:252	pipeline playing	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T09:23:30.627Z	INFO	egress	info/io.go:178	egress_active	{"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "error": "", "code": 0, "details": ""}
2025-04-01T11:37:24.160Z	INFO	egress	info/io.go:178	egress_ending	{"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "error": "", "code": 0, "details": "End reason: StopEgress API"}
2025-04-01T11:37:54.933Z	INFO	egress	source/web.go:150	xvfb: The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Could not resolve keysym XF86CameraAccessEnable
> Warning:          Could not resolve keysym XF86CameraAccessDisable
> Warning:          Could not resolve keysym XF86CameraAccessToggle
	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T11:37:54.937Z	INFO	egress	source/web.go:150	xvfb: > Warning:          Could not resolve keysym XF86NextElement
> Warning:          Could not resolve keysym XF86PreviousElement
> Warning:          Could not resolve keysym XF86AutopilotEngageToggle
> Warning:          Could not resolve keysym XF86MarkWaypoint
> Warning:          Could not resolve keysym XF86Sos
> Warning:          Could not resolve keysym XF86NavChart
> Warning:          Could not resolve keysym XF86FishingChart
> Warning:          Could not resolve keysym XF86SingleRangeRadar
> Warning:          Could not resolve keysym XF86DualRangeRadar
> Warning:          Could not resolve keysym XF86RadarOverlay
> Warning:          Could not resolve keysym XF86TraditionalSonar
> Warning:          Could not resolve keysym XF86ClearvuSonar
> Warning:          Could not resolve keysym XF86SidevuSonar
> Warning:          Could not resolve keysym XF86NavInfo
Errors from xkbcomp are not fatal to the X server
	{"nodeID": "NE_dBp3vF6dmhrm", "handlerID": "EGH_uebztvDVHpNg", "clusterID": "", "egressID": "EG_XyqjWAb54HUa"}
2025-04-01T11:38:11.235Z	INFO	egress	info/io.go:178	egress_failed	{"nodeID": "NE_dBp3vF6dmhrm", "clusterID": "", "egressID": "EG_XyqjWAb54HUa", "requestType": "room_composite", "outputType": "file", "error": "pipeline frozen", "code": 500, "details": "End reason: StopEgress API"}

There was no recording running prior to this one. Could the available CPU potentially increment the possibility of a recording with a pipeline frozen error ?

cruizba avatar Apr 01 '25 12:04 cruizba