Docker build failing
I have a problem while running docker build . I get the following error:
Dockerfile:59
--------------------
57 | RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
58 | WORKDIR /tmp/unique_for_apex/apex
59 | >>> RUN /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
60 | #install pytorch3d
61 | # RUN /opt/miniconda3/envs/py37/bin/pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py37_cu102_pyt171/download.html
--------------------
ERROR: failed to solve: process "/bin/sh -c /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ." did not complete successfully: exit code: 1
do you know how to solve this? It seems that this is due to apex.
Have you solved this? We tested this part, no issue had occurred. Our code can also run with the torch's distributed data parallel without apex. Maybe you just need to modify the code a little bit.
Hi. I seem to be getting the same issue. Any idea what the problem might be and what I can do to fix it?
This might be caused by the update of apex repo. I suggest comment this step and try to install apex manually later. Or just use torch's distributed data parallel to replace the usage of apex.
If you are doing an evaluation, you can also just run on a single GPU without the need for apex.
I hope this could help.
Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex.
Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp
Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change
from apex import amptofrom torch.cuda import amp
I've been facing the same problem, you've mentioned changing from apex import amp to from torch.cuda import amp, what file did you change cuz I can't find the line in the dockerfile. Sorry if this is a stupid question since I'm quite noob.
Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change
from apex import amptofrom torch.cuda import amp
hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".
Traceback (most recent call last):
File "/home/RNNPose/tools/eval.py", line 26, in
any workaround or can I comment it? Thanks in advance
Hello,
I'm sorry but I don't work on this project anymore and I don't have access to this code now ..
I hope you will find the solution
Good luck in your project
Jean
Le jeu. 15 août 2024 à 12:06, Nishanth D @.***> a écrit :
Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp
hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".
Traceback (most recent call last): File "/home/RNNPose/tools/eval.py", line 26, in from builder import ( File "/home/RNNPose/builder/rnnpose_builder.py", line 1, in from builder import losses_builder File "/home/RNNPose/builder/losses_builder.py", line 2, in from model import losses File "/home/RNNPose/model/losses.py", line 22, in class Loss(nn.Module): File "/home/RNNPose/model/losses.py", line 65, in Loss @amp https://github.com/amp.float_function AttributeError: module 'torch.cuda.amp' has no attribute 'float_function'
any workaround or can I comment it? Thanks in advance
— Reply to this email directly, view it on GitHub https://github.com/DecaYale/RNNPose/issues/20#issuecomment-2291011156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY6JIF2PKE5YCZIL3YT5XBDZRR4RZAVCNFSM6AAAAABMR5XPOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGAYTCMJVGY . You are receiving this because you commented.Message ID: @.***>