issue-tracking icon indicating copy to clipboard operation
issue-tracking copied to clipboard

COMET DEBUG: No relevant cgroup controllers mounted.

Open mhnazeri opened this issue 10 months ago • 4 comments

Describe the Bug

Running CometML in Pop_OS 22.04 causes this weird error. The same code runs without a problem on Fedora 39. I'm not using docker, just python venv.

Expected behavior

Running experiment logging.

Where is the issue?

  • [x] Comet Python SDK
  • [ ] Comet UI
  • [ ] Third Party Integrations (Huggingface, TensorboardX, Pytorch Lighting etc)

To Reproduce

Steps to reproduce the behavior:

  1. Integrate comet with pytorch code to log the data
  2. See error

Stack Trace

At first it shows these warnings:

2024-04-16 13:48:39,758 COMET DEBUG: Reading cgroups info from: /proc/cgroups
2024-04-16 13:48:39,758 COMET DEBUG: #subsys_name	hierarchy	num_cgroups	enabled

2024-04-16 13:48:39,758 COMET DEBUG: cpuset	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: cpu	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: cpuacct	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: blkio	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: memory	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: devices	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: freezer	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: net_cls	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: perf_event	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: net_prio	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: hugetlb	0	224	1

2024-04-16 13:48:39,758 COMET DEBUG: pids	0	224	1

2024-04-16 13:48:39,759 COMET DEBUG: rdma	0	224	1

2024-04-16 13:48:39,759 COMET DEBUG: misc	0	224	1

2024-04-16 13:48:39,759 COMET DEBUG: is_cgroupsV2=True
2024-04-16 13:48:39,759 COMET DEBUG: Reading self cgroups info from: /proc/self/cgroup
2024-04-16 13:48:39,759 COMET DEBUG: 0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-bffe3c2b-c664-439c-b74f-dde8231f07ae.scope

2024-04-16 13:48:39,759 COMET DEBUG: Reading mountinfo from: /proc/self/mountinfo
2024-04-16 13:48:39,759 COMET DEBUG: 25 32 0:23 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 26 32 0:24 / /proc rw,nosuid,nodev,noexec,relatime shared:13 - proc proc rw

2024-04-16 13:48:39,759 COMET DEBUG: 27 32 0:5 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=32518228k,nr_inodes=8129557,mode=755,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 28 27 0:25 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000

2024-04-16 13:48:39,759 COMET DEBUG: 29 32 0:26 / /run rw,nosuid,nodev,noexec,relatime shared:5 - tmpfs tmpfs rw,size=6512340k,mode=755,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 30 25 0:27 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime shared:8 - efivarfs efivarfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 32 1 259:3 / / rw,noatime shared:1 - ext4 /dev/nvme0n1p3 rw,errors=remount-ro

2024-04-16 13:48:39,759 COMET DEBUG: 33 25 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:9 - securityfs securityfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 34 27 0:29 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 35 29 0:30 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 36 25 0:31 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot

2024-04-16 13:48:39,759 COMET DEBUG: 37 25 0:32 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:11 - pstore pstore rw

2024-04-16 13:48:39,759 COMET DEBUG: 38 25 0:33 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:12 - bpf bpf rw,mode=700

2024-04-16 13:48:39,759 COMET DEBUG: 39 26 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:14 - autofs systemd-1 rw,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20163

2024-04-16 13:48:39,759 COMET DEBUG: 40 27 0:20 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:15 - mqueue mqueue rw

2024-04-16 13:48:39,759 COMET DEBUG: 41 27 0:35 / /dev/hugepages rw,relatime shared:16 - hugetlbfs hugetlbfs rw,pagesize=2M

2024-04-16 13:48:39,759 COMET DEBUG: 42 25 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:17 - debugfs debugfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 43 25 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:18 - tracefs tracefs rw

2024-04-16 13:48:39,759 COMET DEBUG: 44 25 0:36 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:19 - fusectl fusectl rw

2024-04-16 13:48:39,759 COMET DEBUG: 45 25 0:21 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:20 - configfs configfs rw

2024-04-16 13:48:39,760 COMET DEBUG: 68 29 0:37 / /run/credentials/systemd-sysusers.service ro,nosuid,nodev,noexec,relatime shared:21 - ramfs ramfs rw,mode=700

2024-04-16 13:48:39,760 COMET DEBUG: 93 32 259:2 / /recovery rw,relatime shared:31 - vfat /dev/nvme0n1p2 rw,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro

2024-04-16 13:48:39,760 COMET DEBUG: 96 32 259:1 / /boot/efi rw,relatime shared:47 - vfat /dev/nvme0n1p1 rw,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro

2024-04-16 13:48:39,760 COMET DEBUG: 99 39 0:38 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:49 - binfmt_misc binfmt_misc rw

2024-04-16 13:48:39,760 COMET DEBUG: 1023 29 0:55 / /run/user/1000 rw,nosuid,nodev,relatime shared:573 - tmpfs tmpfs rw,size=6512336k,nr_inodes=1628084,mode=700,uid=1000,gid=1000,inode64

2024-04-16 13:48:39,760 COMET DEBUG: 830 1023 0:57 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:546 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000

2024-04-16 13:48:39,760 COMET DEBUG: 1082 1023 0:58 / /run/user/1000/doc rw,nosuid,nodev,relatime shared:582 - fuse.portal portal rw,user_id=1000,group_id=1000

2024-04-16 13:48:39,760 COMET DEBUG: No relevant cgroup controllers mounted.
2024-04-16 13:48:39,760 COMET DEBUG: CGROUP container detection failed, exception=Required cgroup subsystem files not found

Screenshots or GIFs

After that warning, all I see is this: Screenshot from 2024-04-16 13-39-19

mhnazeri avatar Apr 16 '24 17:04 mhnazeri

Interesting... @mhnazeri can you provide a small bit of code that demonstrates this? Or provide a link to a Comet experiment?

dsblank avatar Apr 22 '24 16:04 dsblank

I made this repo public that produces that specific output on Pop_Os! 22.04. I should mention that this code runs fine on Fedora 39. I suspect it might be an issue with a package (maybe related to croups) but I installed everything related to cgroups but it didn't help. I also don't know why it needs something like that.

To run the code from the repo just put a few images in the data folder and run python run.py. Also make sure that the debug flag in the config file is False, otherwise it disables comet. All the config file for the comet are residing here.

mhnazeri avatar Apr 23 '24 03:04 mhnazeri

@mhnazeri thank you for the reproducable info! I'll pass this on to the engineering team.

dsblank avatar Apr 24 '24 12:04 dsblank

This is being tracked as CM-10253.

dsblank avatar Jul 05 '24 14:07 dsblank

The fix for this is scheduled to be made soon.

dsblank avatar Jul 26 '24 11:07 dsblank

Hi @mhnazeri, would it be possible for you to add one of our Comet engineers to your repo? He is @yaricom

dsblank avatar Jul 30 '24 12:07 dsblank

Hi. Done. Thanks for the follow up.

I think I had to set the logging level with the env variable to info explicitly to suppress this behavior.

export COMET_LOGGING_CONSOLE=info

But I'm not sure about it. I didn't change anything in .comet.config file.

mhnazeri avatar Jul 30 '24 21:07 mhnazeri

@mhnazeri, yes, I think you are correct. We are going to prevent COMET_LOGGING_CONSOLE from being set to "debug".

Please let us know if you have any further questions or issues. I'll close this ticket.

dsblank avatar Aug 01 '24 13:08 dsblank