RNNPose icon indicating copy to clipboard operation
RNNPose copied to clipboard

Docker build failing

Open francescosarno opened this issue 2 years ago • 7 comments

I have a problem while running docker build . I get the following error:

Dockerfile:59
--------------------
  57 |     RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
  58 |     WORKDIR /tmp/unique_for_apex/apex
  59 | >>> RUN /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
  60 |     #install pytorch3d 
  61 |     # RUN /opt/miniconda3/envs/py37/bin/pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py37_cu102_pyt171/download.html
--------------------
ERROR: failed to solve: process "/bin/sh -c /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ." did not complete successfully: exit code: 1

do you know how to solve this? It seems that this is due to apex.

francescosarno avatar May 03 '23 13:05 francescosarno

Have you solved this? We tested this part, no issue had occurred. Our code can also run with the torch's distributed data parallel without apex. Maybe you just need to modify the code a little bit.

DecaYale avatar May 06 '23 08:05 DecaYale

Hi. I seem to be getting the same issue. Any idea what the problem might be and what I can do to fix it?

Kaladin-Syl-WR avatar May 08 '23 13:05 Kaladin-Syl-WR

This might be caused by the update of apex repo. I suggest comment this step and try to install apex manually later. Or just use torch's distributed data parallel to replace the usage of apex. If you are doing an evaluation, you can also just run on a single GPU without the need for apex.
I hope this could help.

DecaYale avatar May 09 '23 06:05 DecaYale

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

mqtjean avatar May 10 '23 12:05 mqtjean

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

I've been facing the same problem, you've mentioned changing from apex import amp to from torch.cuda import amp, what file did you change cuz I can't find the line in the dockerfile. Sorry if this is a stupid question since I'm quite noob.

brian2lee avatar Jul 01 '23 10:07 brian2lee

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".

Traceback (most recent call last): File "/home/RNNPose/tools/eval.py", line 26, in from builder import ( File "/home/RNNPose/builder/rnnpose_builder.py", line 1, in from builder import losses_builder File "/home/RNNPose/builder/losses_builder.py", line 2, in from model import losses File "/home/RNNPose/model/losses.py", line 22, in class Loss(nn.Module): File "/home/RNNPose/model/losses.py", line 65, in Loss @amp.float_function AttributeError: module 'torch.cuda.amp' has no attribute 'float_function'

any workaround or can I comment it? Thanks in advance

Nishanth21D avatar Aug 15 '24 10:08 Nishanth21D

Hello,

I'm sorry but I don't work on this project anymore and I don't have access to this code now ..

I hope you will find the solution

Good luck in your project

Jean

Le jeu. 15 août 2024 à 12:06, Nishanth D @.***> a écrit :

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".

Traceback (most recent call last): File "/home/RNNPose/tools/eval.py", line 26, in from builder import ( File "/home/RNNPose/builder/rnnpose_builder.py", line 1, in from builder import losses_builder File "/home/RNNPose/builder/losses_builder.py", line 2, in from model import losses File "/home/RNNPose/model/losses.py", line 22, in class Loss(nn.Module): File "/home/RNNPose/model/losses.py", line 65, in Loss @amp https://github.com/amp.float_function AttributeError: module 'torch.cuda.amp' has no attribute 'float_function'

any workaround or can I comment it? Thanks in advance

— Reply to this email directly, view it on GitHub https://github.com/DecaYale/RNNPose/issues/20#issuecomment-2291011156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY6JIF2PKE5YCZIL3YT5XBDZRR4RZAVCNFSM6AAAAABMR5XPOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGAYTCMJVGY . You are receiving this because you commented.Message ID: @.***>

mqtjean avatar Aug 15 '24 13:08 mqtjean