issue-tracking
issue-tracking copied to clipboard
COMET DEBUG: No relevant cgroup controllers mounted.
Describe the Bug
Running CometML in Pop_OS 22.04 causes this weird error. The same code runs without a problem on Fedora 39. I'm not using docker, just python venv.
Expected behavior
Running experiment logging.
Where is the issue?
- [x] Comet Python SDK
- [ ] Comet UI
- [ ] Third Party Integrations (Huggingface, TensorboardX, Pytorch Lighting etc)
To Reproduce
Steps to reproduce the behavior:
- Integrate comet with pytorch code to log the data
- See error
Stack Trace
At first it shows these warnings:
2024-04-16 13:48:39,758 COMET DEBUG: Reading cgroups info from: /proc/cgroups
2024-04-16 13:48:39,758 COMET DEBUG: #subsys_name hierarchy num_cgroups enabled
2024-04-16 13:48:39,758 COMET DEBUG: cpuset 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: cpu 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: cpuacct 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: blkio 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: memory 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: devices 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: freezer 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: net_cls 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: perf_event 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: net_prio 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: hugetlb 0 224 1
2024-04-16 13:48:39,758 COMET DEBUG: pids 0 224 1
2024-04-16 13:48:39,759 COMET DEBUG: rdma 0 224 1
2024-04-16 13:48:39,759 COMET DEBUG: misc 0 224 1
2024-04-16 13:48:39,759 COMET DEBUG: is_cgroupsV2=True
2024-04-16 13:48:39,759 COMET DEBUG: Reading self cgroups info from: /proc/self/cgroup
2024-04-16 13:48:39,759 COMET DEBUG: 0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-bffe3c2b-c664-439c-b74f-dde8231f07ae.scope
2024-04-16 13:48:39,759 COMET DEBUG: Reading mountinfo from: /proc/self/mountinfo
2024-04-16 13:48:39,759 COMET DEBUG: 25 32 0:23 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
2024-04-16 13:48:39,759 COMET DEBUG: 26 32 0:24 / /proc rw,nosuid,nodev,noexec,relatime shared:13 - proc proc rw
2024-04-16 13:48:39,759 COMET DEBUG: 27 32 0:5 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=32518228k,nr_inodes=8129557,mode=755,inode64
2024-04-16 13:48:39,759 COMET DEBUG: 28 27 0:25 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
2024-04-16 13:48:39,759 COMET DEBUG: 29 32 0:26 / /run rw,nosuid,nodev,noexec,relatime shared:5 - tmpfs tmpfs rw,size=6512340k,mode=755,inode64
2024-04-16 13:48:39,759 COMET DEBUG: 30 25 0:27 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime shared:8 - efivarfs efivarfs rw
2024-04-16 13:48:39,759 COMET DEBUG: 32 1 259:3 / / rw,noatime shared:1 - ext4 /dev/nvme0n1p3 rw,errors=remount-ro
2024-04-16 13:48:39,759 COMET DEBUG: 33 25 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:9 - securityfs securityfs rw
2024-04-16 13:48:39,759 COMET DEBUG: 34 27 0:29 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw,inode64
2024-04-16 13:48:39,759 COMET DEBUG: 35 29 0:30 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k,inode64
2024-04-16 13:48:39,759 COMET DEBUG: 36 25 0:31 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot
2024-04-16 13:48:39,759 COMET DEBUG: 37 25 0:32 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:11 - pstore pstore rw
2024-04-16 13:48:39,759 COMET DEBUG: 38 25 0:33 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:12 - bpf bpf rw,mode=700
2024-04-16 13:48:39,759 COMET DEBUG: 39 26 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:14 - autofs systemd-1 rw,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20163
2024-04-16 13:48:39,759 COMET DEBUG: 40 27 0:20 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:15 - mqueue mqueue rw
2024-04-16 13:48:39,759 COMET DEBUG: 41 27 0:35 / /dev/hugepages rw,relatime shared:16 - hugetlbfs hugetlbfs rw,pagesize=2M
2024-04-16 13:48:39,759 COMET DEBUG: 42 25 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:17 - debugfs debugfs rw
2024-04-16 13:48:39,759 COMET DEBUG: 43 25 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:18 - tracefs tracefs rw
2024-04-16 13:48:39,759 COMET DEBUG: 44 25 0:36 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:19 - fusectl fusectl rw
2024-04-16 13:48:39,759 COMET DEBUG: 45 25 0:21 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:20 - configfs configfs rw
2024-04-16 13:48:39,760 COMET DEBUG: 68 29 0:37 / /run/credentials/systemd-sysusers.service ro,nosuid,nodev,noexec,relatime shared:21 - ramfs ramfs rw,mode=700
2024-04-16 13:48:39,760 COMET DEBUG: 93 32 259:2 / /recovery rw,relatime shared:31 - vfat /dev/nvme0n1p2 rw,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
2024-04-16 13:48:39,760 COMET DEBUG: 96 32 259:1 / /boot/efi rw,relatime shared:47 - vfat /dev/nvme0n1p1 rw,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
2024-04-16 13:48:39,760 COMET DEBUG: 99 39 0:38 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:49 - binfmt_misc binfmt_misc rw
2024-04-16 13:48:39,760 COMET DEBUG: 1023 29 0:55 / /run/user/1000 rw,nosuid,nodev,relatime shared:573 - tmpfs tmpfs rw,size=6512336k,nr_inodes=1628084,mode=700,uid=1000,gid=1000,inode64
2024-04-16 13:48:39,760 COMET DEBUG: 830 1023 0:57 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:546 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000
2024-04-16 13:48:39,760 COMET DEBUG: 1082 1023 0:58 / /run/user/1000/doc rw,nosuid,nodev,relatime shared:582 - fuse.portal portal rw,user_id=1000,group_id=1000
2024-04-16 13:48:39,760 COMET DEBUG: No relevant cgroup controllers mounted.
2024-04-16 13:48:39,760 COMET DEBUG: CGROUP container detection failed, exception=Required cgroup subsystem files not found
Screenshots or GIFs
After that warning, all I see is this:
Interesting... @mhnazeri can you provide a small bit of code that demonstrates this? Or provide a link to a Comet experiment?
I made this repo public that produces that specific output on Pop_Os! 22.04. I should mention that this code runs fine on Fedora 39. I suspect it might be an issue with a package (maybe related to croups) but I installed everything related to cgroups but it didn't help. I also don't know why it needs something like that.
To run the code from the repo just put a few images in the data folder and run python run.py
. Also make sure that the debug
flag in the config file is False
, otherwise it disables comet. All the config file for the comet are residing here.
@mhnazeri thank you for the reproducable info! I'll pass this on to the engineering team.
This is being tracked as CM-10253.
The fix for this is scheduled to be made soon.
Hi @mhnazeri, would it be possible for you to add one of our Comet engineers to your repo? He is @yaricom
Hi. Done. Thanks for the follow up.
I think I had to set the logging level with the env variable to info
explicitly to suppress this behavior.
export COMET_LOGGING_CONSOLE=info
But I'm not sure about it. I didn't change anything in .comet.config
file.
@mhnazeri, yes, I think you are correct. We are going to prevent COMET_LOGGING_CONSOLE from being set to "debug".
Please let us know if you have any further questions or issues. I'll close this ticket.