mmskeleton
mmskeleton copied to clipboard
How to load a custom skeleton-based dataset, then train & test it
I'm attempting to load in a custom skeleton-based dataset. The custom dataset just has x,y,z coordinate data (there is no RGB video data so no video processing is required with the openpose software). I have been trying to follow the instructions here, but am running into some issues.
I have several questions related to the load, and then a few about training the model:
Question 1. I am re-formatting the custom dataset to the *.json
file format. Since my data has no associated video, what do I set the "video_name":
and "resolution":
values to? Do I set them to null, or remove them completely?
Question 2. In the *.json
file format example, there is an entry for "version": "1.0"
. Do I just put the same in my custom dataset *.json
files? What is this version number for?
Question 3. I'm assuming that I do not need to create a yaml file similar to configs/utils/build_dataset_example.yaml
, since it simply creates the *.json
files (which I will do separately with my custom dataset). Is this correct?
Question 4. I noticed files (train_data.npy val_data.npy train_label.pkl val_label.pkl
) in, for example, the data/Kinetics/kinetics-skeleton
directory. Do these files get generated automatically when I train the model train.yaml
? Or, do I have to do some other step when loading in the custom dataset?
Question 5. What is the best sample train.yaml
and test.yaml
file to use as a base for creating the test/train yaml files for my custom dataset? Are the ones in mmskeleton/configs/recognition/st_gcn/dataset_example
the best to start with?
Question 6. Assuming I use the train.yaml
from Question 5, I have some questions about the various configurations. I'll limit my question to the dataset_cfg:
section since that is where I assume that I have to do some customization. There are two - type: "datasets.DataPipeline"
subsections. Is just one - type
section enough? Also, I'm unsure what the num_track:
, num_keypoints:
and repeat:
are for? and, for the pipeline:
section, are those all related to RGB video data and hence I can omit them?
Thank you in advance for reviewing and answering!
Further to my questions above, I am able to create an experiment where I can load in a custom skeleton dataset and run the training recognition, but neither the loss value nor the validation accuracy improve from one epoch to the next (we only get 2% or so accuracy which is the same as just randomly choosing a label for each test sequence).
The custom skeleton dataset has:
- one subject per sequence
- 10 upper-body joints (I believe you refer to them as keypoints)
- 73 classes
- roughly 5000 total sequences
- 3-dimensional x,y,z coordinates for each joint
- initially captured using the Kinect
The sample train.yaml
file is here:
train.txt
In the graph_cfg:
section of the train.yaml
file it appears as though we have to select a layout:
. We have tried values 'ntu-rgb+d'
as well as 'openpose'
, but those seem to assume that the skeleton has 25 or 18 joints, respectively. So we actually mapped our 10-joint custom skeleton to 25 joints, or 18 joints, so that we could use the 'ntu-rgb+d'
or 'openpose'
options and see what would happen.
Here is an example *.json file where we mapped our custom 10-joint skeleton to the 25-joint 'ntu-rgb+d'
skeleton layout:
train10.txt
And here is an example *.json file where we mapped our custom 10-joint skeleton to the 18-joint 'openpose'
skeleton layout (we just extracted the x,y coordinate values and not the z coordinate values from our custom dataset since the 'openpose'
skeleton layout is just 2-d):
train24.txt
However, as mentioned above the neural network training does not improve on the validation accuracy as the epochs go by. Do we need to specify the layout:
and if so does it have to be one of the pre-defined layout types? We were assuming that loading in a custom dataset meant that we could have different numbers/types of joints. Is this true and we are just using the mmskeleton software incorrectly?
Also, one other question. we had to comment out the following code for the train and test pipeline: - {type: "datasets.skeleton.normalize_by_resolution"}
. This is because if you look in the sample json
files we do not specify the resolution (since our dataset is just the x/y/z coordinate skeletal data and not the RGB vids). Is this ok to do? Does this have unintended consequences?
Thanks!
Why are the coordinates of the joint points less than 1, and what should be done with the coordinates?
can you tell me what is "id"?
Why are the coordinates of the joint points less than 1, and what should be done with the coordinates?
Their coordinates are normalized (divided by the resolution), their resolution was set to null. I don't know if this is right, maybe you are supposed to specify a resolution, like [224, 224] and then use coordinates like [100, 150]. Does anyone know the correct way to build and train a custom skeleton-based dataset? The instructions were unclear to me.
@japfeifer Were you able to find out the answers to your layout and normalization questions? I have the same questions.
We were able to finally get things to run in the (older) st-gcn environment (but not the newer mm-skeleton environment). I've enclosed two matlab files that show how we mapped our custom skeleton to the NTURGBD format, as well as some files that st-gcn requires to run the data load and training. Hope this helps a bit. sample_code.zip
Question 6:
- The two dataset_pipeline is related to the workflow: [['train', 5], ['val', 1]] , it means that the validation will be performed after every five training iterations during the training process. The first pipeline is to generate training samples, and the second pippeline is to generate validation samples. So you have to use two pipelines, or you can change the workflow config.
- num_track is the people numbers (skeleton numbers) you set in every data frame; key_points is the keypoint numbers of the skeleton model you chosed; repeat is used to augument your dataset, the final training dataset length is len(your_training_data_samples)*repeat, that means every skeleton file will be load repeat times, noticed that simulate_camera_moving config in the first pipeline, it will generate random change in every data sample, so you will get len(your_training_data_samples)*repeat different training samples in the final.
Why are the coordinates of the joint points less than 1, and what should be done with the coordinates?
i have the same problem,have you solve it?