deepstack-trainer
deepstack-trainer copied to clipboard
Custom model training fails, need to downgrade torch (and setuptools)
Hi,
I am using the deepquestai/deepstack:gpu-2022.01.1 container to do custom training. It comes with torch for cuda 11.3 but train.py fails after initiation (see error below). This is resolved when I downgrade to torch for cuda 11.0 (pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
as per the collab notebook).
docker run --gpus all -it --rm -v /home/eouser/deepstack:/deepstack/code -w /deepstack/code/deepstack-trainer deepquestai/deepstack_updated:gpu python3 train.py --dataset-path /deepstack/code/data
Traceback (most recent call last):
File "train.py", line 530, in
I first need to downgrade setuptools inside the container, btw, because otherwise it throws:
Traceback (most recent call last):
File "train.py", line 21, in
(resolved with: pip install setuptools==59.5.0
)
I am now happily training with the revised setup, so nothing too urgent, but maybe worth checking out.
Thx for this wonderful framework!
Guido