OSCAR
OSCAR copied to clipboard
Docker error.
Hi!I tried to fix it on a servers without internet.So I modified the dockerfile and solve some probelms but this one too hard for me.Could you help me? Thank You Current default time zone: 'Etc/UTC' Local time is now: Sun Nov 20 22:20:26 UTC 2022. Universal Time is now: Sun Nov 20 22:20:26 UTC 2022. Run 'dpkg-reconfigure tzdata' if you wish to change it.
Setting up systemd-sysv (245.4-4ubuntu3.18) ...
Setting up libelf1:amd64 (0.176-1.1build1) ...
Setting up libicu66:amd64 (66.1-2ubuntu2.1) ...
Setting up libglib2.0-0:amd64 (2.64.6-1~ubuntu20.04.4) ...
Setting up libtinfo6:amd64 (6.2-0ubuntu2) ...
Setting up libproxy1v5:amd64 (0.4.15-10ubuntu1.2) ...
Setting up glib-networking-services (2.64.2-1ubuntu0.1) ...
Setting up distro-info-data (0.43ubuntu1.11) ...
Setting up cmake-data (3.16.3-1ubuntu1) ...
Setting up libstemmer0d:amd64 (0+svn585-2) ...
Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build1) ...
Setting up libpackagekit-glib2-18:amd64 (1.1.13-2ubuntu1.1) ...
Setting up libbsd0:amd64 (0.10.0-1) ...
Setting up libkrb5support0:amd64 (1.17-6ubuntu4.1) ...
Setting up ucf (3.0038+nmu1) ...
Setting up libgirepository-1.0-1:amd64 (1.64.1-1~ubuntu20.04.1) ...
Setting up libxml2:amd64 (2.9.10+dfsg-5ubuntu0.20.04.4) ...
Setting up libmagic-mgc (1:5.38-4) ...
Setting up uuid-runtime (2.34-0.1ubuntu9.3) ...
Adding group uuidd' (GID 105) ... Done. Warning: The home dir /run/uuidd you specified can't be accessed: No such file or directory Adding system user
uuidd' (UID 104) ...
Adding new user uuidd' (UID 104) with group
uuidd' ...
Not creating home directory `/run/uuidd'.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Created symlink /etc/systemd/system/sockets.target.wants/uuidd.socket → /lib/systemd/system/uuidd.socket.
Setting up libmagic1:amd64 (1:5.38-4) ...
Setting up librhash0:amd64 (1.3.9-1) ...
Setting up libcbor0.6:amd64 (0.6.0-0ubuntu1) ...
Setting up libyaml-0-2:amd64 (0.2.2-1) ...
Setting up gir1.2-glib-2.0:amd64 (1.64.1-1~ubuntu20.04.1) ...
Setting up libglib2.0-data (2.64.6-1~ubuntu20.04.4) ...
Setting up krb5-locales (1.17-6ubuntu4.1) ...
Setting up publicsuffix (20200303.0012-1) ...
Setting up libfido2-1:amd64 (1.3.1-1ubuntu2) ...
Setting up wget (1.20.3-1ubuntu2) ...
Setting up libdconf1:amd64 (0.36.0-1) ...
Setting up libcrypt-dev:amd64 (1:4.4.10-10ubuntu4) ...
Setting up dmsetup (2:1.02.167-1ubuntu1) ...
Setting up shared-mime-info (1.15-1) ...
Setting up gir1.2-packagekitglib-1.0 (1.1.13-2ubuntu1.1) ...
Setting up libc-dev-bin (2.31-0ubuntu9.9) ...
Setting up libxdmcp6:amd64 (1:1.1.3-0ubuntu1) ...
Setting up libkeyutils1:amd64 (1.6-6ubuntu1.1) ...
Setting up libglib2.0-bin (2.64.6-1~ubuntu20.04.4) ...
Setting up libc6-dev:amd64 (2.31-0ubuntu9.9) ...
Setting up xdg-user-dirs (0.17-2ubuntu1) ...
Setting up libx11-data (2:1.6.9-2ubuntu1.2) ...
Setting up libxau6:amd64 (1:1.0.9-0ubuntu1) ...
Setting up libmpdec2:amd64 (2.4.2-3) ...
Setting up libpolkit-gobject-1-0:amd64 (0.105-26ubuntu1.3) ...
Setting up libdbus-1-3:amd64 (1.12.16-2ubuntu2.3) ...
Setting up libreadline8:amd64 (8.0-4) ...
Setting up libjsoncpp1:amd64 (1.7.4-3.1ubuntu2) ...
Setting up libedit2:amd64 (3.1-20191231-1) ...
Setting up libk5crypto3:amd64 (1.17-6ubuntu4.1) ...
Setting up less (551-1ubuntu0.1) ...
Setting up libgstreamer1.0-0:amd64 (1.16.3-0ubuntu1.1) ...
Setcap worked! gst-ptp-helper is not suid!
Setting up libarchive13:amd64 (3.4.0-2ubuntu1.2) ...
Setting up libpolkit-agent-1-0:amd64 (0.105-26ubuntu1.3) ...
Setting up libncursesw6:amd64 (6.2-0ubuntu2) ...
Setting up file (1:5.38-4) ...
Setting up libkrb5-3:amd64 (1.17-6ubuntu4.1) ...
Setting up dbus (1.12.16-2ubuntu2.3) ...
Setting up libxcb1:amd64 (1.14-2) ...
Setting up libpython3.8-stdlib:amd64 (3.8.10-0ubuntu1~20.04.5) ...
Setting up libpython3-stdlib:amd64 (3.8.2-0ubuntu2) ...
Setting up libpam-systemd:amd64 (245.4-4ubuntu3.18) ...
Setting up policykit-1 (0.105-26ubuntu1.3) ...
Setting up python3.8 (3.8.10-0ubuntu1~20.04.5) ...
Setting up libx11-6:amd64 (2:1.6.9-2ubuntu1.2) ...
Setting up libxmuu1:amd64 (2:1.1.3-0ubuntu1) ...
Setting up dbus-user-session (1.12.16-2ubuntu2.3) ...
Setting up libgssapi-krb5-2:amd64 (1.17-6ubuntu4.1) ...
Setting up libssh-4:amd64 (0.9.3-2ubuntu2.2) ...
Setting up openssh-client (1:8.2p1-4ubuntu0.5) ...
Setting up libxext6:amd64 (2:1.3.4-0ubuntu1) ...
Setting up python3 (3.8.2-0ubuntu2) ...
Setting up dconf-service (0.36.0-1) ...
Setting up libcurl3-gnutls:amd64 (7.68.0-1ubuntu2.14) ...
Setting up python3-idna (2.8-1) ...
Setting up libcurl4:amd64 (7.68.0-1ubuntu2.14) ...
Setting up python3-six (1.14.0-2) ...
Setting up python3-certifi (2019.11.28-1) ...
Setting up python3-pkg-resources (45.2.0-1) ...
Setting up python3-gi (3.36.0-1) ...
Setting up lsb-release (11.1.0ubuntu2) ...
Setting up xauth (1:1.1-0ubuntu1) ...
Setting up python3-chardet (3.0.4-4build1) ...
Setting up python3-urllib3 (1.25.8-2ubuntu0.1) ...
Setting up cmake (3.16.3-1ubuntu1) ...
Setting up dconf-gsettings-backend:amd64 (0.36.0-1) ...
Setting up git (1:2.25.1-1ubuntu3.6) ...
Setting up python3-distro-info (0.23ubuntu1) ...
Setting up python3-apt (2.0.0ubuntu0.20.04.8) ...
Setting up python3-dbus (1.2.16-1build1) ...
Setting up gsettings-desktop-schemas (3.36.0-1ubuntu1) ...
Setting up glib-networking:amd64 (2.64.2-1ubuntu0.1) ...
Setting up unattended-upgrades (2.3ubuntu0.3) ...
Creating config file /etc/apt/apt.conf.d/20auto-upgrades with new version
Creating config file /etc/apt/apt.conf.d/50unattended-upgrades with new version Created symlink /etc/systemd/system/multi-user.target.wants/unattended-upgrades.service → /lib/systemd/system/unattended-upgrades.service. Setting up python3-requests (2.22.0-2ubuntu1) ... Setting up python3-software-properties (0.99.9.8) ... Setting up networkd-dispatcher (2.1-2~ubuntu20.04.3) ... Created symlink /etc/systemd/system/multi-user.target.wants/networkd-dispatcher.service → /lib/systemd/system/networkd-dispatcher.service. Setting up python3-requests-unixsocket (0.2.0-2) ... Setting up libsoup2.4-1:amd64 (2.70.0-1) ... Setting up libappstream4:amd64 (0.12.10-2) ... Setting up packagekit (1.1.13-2ubuntu1.1) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of force-reload. Failed to open connection to "system" message bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory Created symlink /etc/systemd/user/sockets.target.wants/pk-debconf-helper.socket → /usr/lib/systemd/user/pk-debconf-helper.socket. Setting up software-properties-common (0.99.9.8) ... Setting up packagekit-tools (1.1.13-2ubuntu1.1) ... Processing triggers for systemd (245.4-4ubuntu3.18) ... Processing triggers for libc-bin (2.31-0ubuntu9.9) ... Processing triggers for dbus (1.12.16-2ubuntu2.3) ...
The command '/bin/sh -c apt-get update && apt-get install -y git cmake uuid-runtime lsb-release wget software-properties-common && wget --quiet https://golang.org/dl/go1.16.6.linux-amd64.tar.gz -O ~/go.tar.gz && tar xzf ~/go.tar.gz -C /opt/ && ln -s /opt/go/bin/go /usr/local/bin/go && rm ~/go.tar.gz' returned a non-zero code: 4
Here is the new Dockerfile. The main modification is at env.
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8 ENV PATH /opt/conda/bin:$PATH
ENV DEBIAN_FRONTEND noninteractive
env http_proxy "http://59.69.106.68:808"
env https_proxy "http://59.69.106.68:808"
env ftp_proxy "http://59.69.106.68:808"
ADD sources.list /etc/apt
RUN apt-get update &&
apt-get install -y git cmake uuid-runtime lsb-release wget software-properties-common &&
wget --quiet https://golang.org/dl/go1.16.6.linux-amd64.tar.gz -O ~/go.tar.gz &&
tar xzf ~/go.tar.gz -C /opt/ &&
ln -s /opt/go/bin/go /usr/local/bin/go &&
rm ~/go.tar.gz
It has been solved.But it is another question could you help me? def compute_loss_with_global_feature(feature, loss_func, enable_grad=True): world_size = dist.get_world_size() rank = dist.get_rank() shape = (world_size, ) + feature.shape all_feature = torch.zeros(shape, device=feature.device, dtype=feature.dtype) all_feature[rank] = feature.detach().clone() dist.all_reduce(all_feature) all_feature[rank] = feature loss = loss_func(all_feature) return loss In here the compute will hang...Maybe my computer is two gpu.Could help me?Very thank you. That is really confused。
Sorry for the late reply. I finished my internship at Microsoft last June and am not doing research on machine learning anymore, and I don't have an environment for debugging, so I'm afraid I couldn't help you.
That is really sad。。And your work is really excellent sincerely and im amdire for It.At the end could you tell me if the thought of the loss is moco to compared four data in per gpu.And it doesn't hang in your servers right?Thanks.
I would suggest checking you have GPUs that are fast enough. The program prints log every 100 iterations, which may take a long time with only two GPUs.
Thank you, i would try it
I had try to debug like this and find it hangs in gelu function.. That is very scary.I found with the time change the version of torch in dockerfile and the python hs become very high.So could you tell me the version of torch and python?Thank you.If you are busy,you can not reply this.I would try it myself.