AlphaTracker icon indicating copy to clipboard operation
AlphaTracker copied to clipboard

Error in Colab training

Open SweetStripes74 opened this issue 3 years ago • 5 comments

I'm using your annotated data and training, not changing anything in the lines since I wanted to just check the steps but i'm not getting any results at all using your sample data. I'll get a results folder in my drive but nothing in it and I haven't changed any of the code so I'm uncertain as to why this error under step nine is occurring.

Frame will be saved in /gdrive/result_folder/oriFrameFromVideo//sample_video/frame_folder/ extracting frames from video... processing /gdrive/sample_video.mp4 read failed!make sure that the video format is supported by cv2.VideoCapture 0% 0/300 [00:00<?, ?it/s]read frame failed! 0% 0/300 [00:00<?, ?it/s] getting demo image: CUDA_VISIBLE_DEVICES='0' python3 demo.py
--nClasses 4
--indir /gdrive/result_folder/oriFrameFromVideo//sample_video/frame_folder/
--outdir /gdrive/result_folder
--yolo_model_path /gdrive/AlphaTracker/Tracking/AlphaTracker/train_yolo/darknet//backup/Trial/yolov3-mice_final.weights
--yolo_model_cfg /gdrive/AlphaTracker/Tracking/AlphaTracker/train_yolo/darknet//cfg/yolov3-mice.cfg
--pose_model_path /gdrive/AlphaTracker/Tracking/AlphaTracker/train_sppe/exp/coco/Trial/model_10.pkl
--use_boxGT 0 Loading YOLO model.. not using ground truth box to do the eval. Traceback (most recent call last): File "demo.py", line 60, in det_loader = DetectionLoader(data_loader, batchSize=args.detbatch,use_boxGT=args.use_boxGT,gt_json=args.gt_json).start() File "/content/drive/My Drive/AlphaTracker/Tracking/AlphaTracker/dataloader.py", line 338, in init self.det_model.load_weights(opt.yolo_model_path) File "/content/drive/My Drive/AlphaTracker/Tracking/AlphaTracker/yolo/darknet.py", line 407, in load_weights fp = open(weightfile, "rb") FileNotFoundError: [Errno 2] No such file or directory: '/gdrive/AlphaTracker/Tracking/AlphaTracker/train_yolo/darknet//backup/Trial/yolov3-mice_final.weights'

tracking pose: python ./PoseFlow/tracker-general-fixNum-newSelect-noOrb.py
--imgdir /gdrive/result_folder/oriFrameFromVideo//sample_video/frame_folder/
--in_json /gdrive/result_folder/alphapose-results.json
--out_json /gdrive/result_folder/alphapose-results-forvis-tracked.json
--visdir /gdrive/result_folder/pose_track_vis/ --vis 1
--image_format %s.png --max_pid_id_setting 2 --match 0 --weights 0 6 0 0 0 0
--out_video_path /gdrive/result_folder/Trial_2_0_060000.mp4
Traceback (most recent call last): File "./PoseFlow/tracker-general-fixNum-newSelect-noOrb.py", line 215, in with open(notrack_json) as f: FileNotFoundError: [Errno 2] No such file or directory: '/gdrive/result_folder/alphapose-results.json'

SweetStripes74 avatar Mar 23 '21 16:03 SweetStripes74

Hi, thanks for reaching out again!

  1. Can you confirm that under your My Drive folder in Google Drive, there is a video called sample_video.mp4.
  2. Could you attach the terminal outputs for the train.py step. It appears that the YOLO model was not saved, so this may be another reason for the error

aneeshbal avatar Mar 23 '21 16:03 aneeshbal

  1. Under the sample data folder on MyDrive is the sample video. I didn't see any instruction to extract that and put it into my base drive

2)I know this isn't what you're asking for but this is the terminal output of step 7 noting some other errors I saw as well

nvcc -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52] -Iinclude/ -Isrc/ -DGPU -I/usr/local/cuda/include/ --compiler-options "-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DGPU" -c ./src/convolutional_kernels.cu -o obj/convolutional_kernels.o nvcc fatal : Unsupported gpu architecture 'compute_30' Makefile:92: recipe for target 'obj/convolutional_kernels.o' failed make: *** [obj/convolutional_kernels.o] Error 1 Collecting package metadata (current_repodata.json): done Solving environment: - The environment is inconsistent, please check the package plan carefully The following packages are causing the inconsistency:

  • pytorch/linux-64::pytorch==1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0
  • pytorch/linux-64::torchvision==0.5.0=py36_cu10done
  • Package Plan

environment location: /usr/local

added / updated specs: - pytorch==1.4.0 - torchvision==0.5.0

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
ca-certificates-2021.1.19  |       h06a4308_1         118 KB
certifi-2020.12.5          |   py36h06a4308_0         140 KB
openssl-1.0.2u             |       h7b6447c_0         2.2 MB
pytorch-1.0.0              |py3.6_cuda9.0.176_cudnn7.4.1_1       498.6 MB  pytorch
torchvision-0.2.2          |             py_3          44 KB  pytorch
------------------------------------------------------------
                                       Total:       501.1 MB

The following packages will be REMOVED:

cudatoolkit-8.0-3

The following packages will be UPDATED:

ca-certificates 2019.1.23-0 --> 2021.1.19-h06a4308_1 certifi 2019.3.9-py36_0 --> 2020.12.5-py36h06a4308_0 openssl 1.0.2r-h7b6447c_0 --> 1.0.2u-h7b6447c_0

The following packages will be SUPERSEDED by a higher-priority channel:

torchvision pytorch/linux-64::torchvision-0.5.0-p~ --> pytorch/noarch::torchvision-0.2.2-py_3

The following packages will be DOWNGRADED:

pytorch 1.4.0-py3.6_cuda10.1.243_cudnn7.6.3_0 --> 1.0.0-py3.6_cuda9.0.176_cudnn7.4.1_1

Downloading and Extracting Packages pytorch-1.0.0 | 498.6 MB | : 100% 1.0/1 [01:30<00:00, 90.58s/it]
certifi-2020.12.5 | 140 KB | : 100% 1.0/1 [00:00<00:00, 6.52it/s] torchvision-0.2.2 | 44 KB | : 100% 1.0/1 [00:01<00:00, 1.12s/it]
openssl-1.0.2u | 2.2 MB | : 100% 1.0/1 [00:00<00:00, 5.99it/s] ca-certificates-2021 | 118 KB | : 100% 1.0/1 [00:00<00:00, 15.62it/s] Preparing transaction: done Verifying transaction: done Executing transaction: done

  1. Here is the terminal output for Step 8

*** training detector *** train.sh: line 1: ./darknet: No such file or directory training finished.

SweetStripes74 avatar Mar 23 '21 17:03 SweetStripes74

After retraining and tracking again with the video in the main file I believe I got the same exact error (posted below)

Frame will be saved in /gdrive/result_folder/oriFrameFromVideo//sample_video/frame_folder/ extracting frames from video... processing /gdrive/sample_video.mp4 100% 300/300 [01:05<00:00, 4.89it/s] getting demo image: CUDA_VISIBLE_DEVICES='0' python3 demo.py
--nClasses 4
--indir /gdrive/result_folder/oriFrameFromVideo//sample_video/frame_folder/
--outdir /gdrive/result_folder
--yolo_model_path /gdrive/AlphaTracker/Tracking/AlphaTracker/train_yolo/darknet//backup/Trial/yolov3-mice_final.weights
--yolo_model_cfg /gdrive/AlphaTracker/Tracking/AlphaTracker/train_yolo/darknet//cfg/yolov3-mice.cfg
--pose_model_path /gdrive/AlphaTracker/Tracking/AlphaTracker/train_sppe/exp/coco/Trial/model_10.pkl
--use_boxGT 0 Loading YOLO model.. not using ground truth box to do the eval. Traceback (most recent call last): File "demo.py", line 60, in det_loader = DetectionLoader(data_loader, batchSize=args.detbatch,use_boxGT=args.use_boxGT,gt_json=args.gt_json).start() File "/content/drive/My Drive/AlphaTracker/Tracking/AlphaTracker/dataloader.py", line 338, in init self.det_model.load_weights(opt.yolo_model_path) File "/content/drive/My Drive/AlphaTracker/Tracking/AlphaTracker/yolo/darknet.py", line 407, in load_weights fp = open(weightfile, "rb") FileNotFoundError: [Errno 2] No such file or directory: '/gdrive/AlphaTracker/Tracking/AlphaTracker/train_yolo/darknet//backup/Trial/yolov3-mice_final.weights'

tracking pose: python ./PoseFlow/tracker-general-fixNum-newSelect-noOrb.py
--imgdir /gdrive/result_folder/oriFrameFromVideo//sample_video/frame_folder/
--in_json /gdrive/result_folder/alphapose-results.json
--out_json /gdrive/result_folder/alphapose-results-forvis-tracked.json
--visdir /gdrive/result_folder/pose_track_vis/ --vis 1
--image_format %s.png --max_pid_id_setting 2 --match 0 --weights 0 6 0 0 0 0
--out_video_path /gdrive/result_folder/Trial_2_0_060000.mp4
Traceback (most recent call last): File "./PoseFlow/tracker-general-fixNum-newSelect-noOrb.py", line 215, in with open(notrack_json) as f: FileNotFoundError: [Errno 2] No such file or directory: '/gdrive/result_folder/alphapose-results.json'

SweetStripes74 avatar Mar 23 '21 17:03 SweetStripes74

I see the error now, it is primarily an error in the make step for YOLO. It appears that support for compute_30 has been removed in higher CUDA versions, so I will need to edit the code to adjust for that. I will let you know when I have an updated version ready.

Thanks!

aneeshbal avatar Mar 23 '21 17:03 aneeshbal

Gotcha; thank you!

SweetStripes74 avatar Mar 23 '21 17:03 SweetStripes74