trt_pose icon indicating copy to clipboard operation
trt_pose copied to clipboard

Training the model

Open dy1ngs0ul opened this issue 4 years ago • 10 comments

Hello Nvidia AI-IOT team,

First of all thank you very much for your effort in creating this code. I am Zeyan and currently working on real time pose estimation implementation on Jetson AGX Xavier. My goal is to use Depths image (from Intel real sense camera) and check whether the depths information could help improve the performance of pose estimation or not.

Before I conduct my experiments. First I wish to train the model to act as an base line for our experiments. From your training script it seems config.json file is required to trained the network. As i wish to follow your parameters for this baseline training. It would be great if you could provide me your conifg file so that I could follow your step and parameters to train your model.

Thanks in advance for your help and support. I will be looking forward for your reply. Please let me know if you have anything to say,

Thanks Dr. Zeyan Oo

dy1ngs0ul avatar Jan 09 '20 03:01 dy1ngs0ul

Hi dy1ngs0ul,

Thanks for reaching out!

You may find the training configuration files in this directory

https://github.com/NVIDIA-AI-IOT/trt_pose/blob/master/tasks/human_pose/experiments/resnet18_baseline_att_224x224_A.json

Please let me know if you have any questions.

Best, John

jaybdub avatar Feb 04 '20 18:02 jaybdub

Thanks for your help

dy1ngs0ul avatar Feb 26 '20 06:02 dy1ngs0ul

@jaybdub , Thanks for your excellent work! So far I think cmap_channels means keypoint numbers, paf_channels equals to 2*connections, Can you explain upsample_channels means?

"model": {
        "name": "densenet121_baseline_att",
        "kwargs": {
            "cmap_channels": 18,
            "paf_channels": 42,
            "upsample_channels": 256,
            "num_upsample": 3
        }
    },

kinglintianxia avatar Apr 26 '20 15:04 kinglintianxia

Hi guys! Have any of you succesfully completed any training using the script provided within the repo? I'm trying to prune the models but I can't seem to be able to proceed with retraining using train.py because of inconcistency between paf tensors' size: Traceback (most recent call last): File "provaTrain.py", line 150, in <module> paf_mse = torch.mean(mask * (paf_out - paf)**2) File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/wrap.py", line 58, in wrapper return orig_fn(*new_args, **kwargs) RuntimeError: The size of tensor a (42) must match the size of tensor b (38) at non-singleton dimension 1

NicolaGugole avatar Jun 23 '20 20:06 NicolaGugole

I solved it myself, thank you anyway!

NicolaGugole avatar Jun 25 '20 17:06 NicolaGugole

Hey all, I'm having a similar error as @NicolaGugole using the training dataset downloaded through the provided shell script. Any tips on how to fix this would be greatly appreciated ! Edit:Nevermind, one just has to edit the model attribute of the json file referenced earlier to match tensor sizes.

OliverGuy avatar Jul 01 '20 11:07 OliverGuy

Hey all, I'm having a similar error as @NicolaGugole using the training dataset downloaded through the provided shell script. Any tips on how to fix this would be greatly appreciated ! Edit:Nevermind, one just has to edit the model attribute of the json file referenced earlier to match tensor sizes.

In my case I had to change the annotation file because I noticed a difference between the annotation keypoints number (17 keypoints) and the human_pose.json number (18 keypoints). This difference in tensor sizes is weird in my opinion. Forcing this sizes to match does not create a fruitful training in my case, I assume because of the fact that the annotation files contain values created to match 17 keypoints while we modified them to match 18 keypoints.

I noticed that in this config file (https://github.com/NVIDIA-AI-IOT/trt_pose/blob/master/tasks/human_pose/experiments/resnet18_baseline_att_224x224_A.json) the devs used a "modified" version of the json file. I hope in the near future we'll have the opportunity to take a look at the modified version of these json files (maybe the devs could upload the files to this repo).

So I have a question @OliverGuy : did you just change the kwargs cmap_channels and paf_channels in the json file referenced earlier? Did that do the job? I tried to do the same but ended up with other conflicts.

Sorry for bothering you all, Have a nice day!

NicolaGugole avatar Jul 06 '20 08:07 NicolaGugole

@NicolaGugole I only modified those in the json, but I'm having issues with CudNN not finding the convolution algorithm (see #54).

OliverGuy avatar Jul 06 '20 09:07 OliverGuy

@NicolaGugole

You have to pre-process the coco annotations. This adds the "Neck" keypoint (midpoint of shoulders) so that you will have 18 keypoints. Use the command:

python3 preprocess_coco_person.py annotations/person_keypoints_train2017.json annotations/person_keypoints_train2017_modified.json

silent-code avatar Nov 04 '20 20:11 silent-code

@jaybdub , Thanks for your excellent work! So far I think cmap_channels means keypoint numbers, paf_channels equals to 2*connections, Can you explain upsample_channels means?

"model": {
        "name": "densenet121_baseline_att",
        "kwargs": {
            "cmap_channels": 18,
            "paf_channels": 42,
            "upsample_channels": 256,
            "num_upsample": 3
        }
    },

Did you figure out what upsample_channels means? I am struggling with the same issue as you.

sinuku avatar Aug 26 '21 07:08 sinuku