tao_toolkit_recipes
tao_toolkit_recipes copied to clipboard
seems like SHAD dataset is no longer available?
It seems like SHAD dataset is not available anymore.
Below code returns error: wget -P ./ https://best.sjtu.edu.cn/Assets/userfiles/sys_eb538c1c-65ff-4e82-8e6a-a1ef01127fed/files/ZIP/Bend-train.rar
Do you have any other links available for this?
Also, if you want to use custom dataset to generate optical flow data, what are the procedures? Use NVIDIA Optical Flow (NVOF) SDK?
-
We only have official link to SHAD dataset.
-
If you want to use NVOF SDK to generate optical flow (and you have turing or ampere devices), you could download the binary based on NVOF SDK with the action recognition notebook from NGC:
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.3.0/files/action_recognition_net/AppOFCuda
The binary is called in the preprocess script like this: https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes/blob/main/tao_action_recognition/data_generation/preprocess_SHAD.sh#L40
./AppOFCuda --input=${RGB_PATH_LIST[i]}/"*.png" --output=${OF_PATH_LIST[i]}/"flow" --preset=slow --gridSize=1
But it is also fine to generate optical flow using opencv. TAO toolkit does not care where you get your optical flow vector.
Some reference implementation: https://github.com/yjxiong/temporal-segment-networks
Thanks for quick follow up!
Before I follow your suggestions above, I wanted to test my custom data on 2D RGB settings for TAO training, and I have ran across some issues. I would like to hear your insights on this.
I have previously trained multiple TAO models, but this is first time training action_recognition_model. My data, however, is not humans, but they are 480P videos of cows, performing following classes of actions : Eating, Sitting, Walking, and Standing.
I realize there is a difference between my custom dataset and HMDB dataset, but I had no problem running your preprocess_HMDB_RGB.sh script on my custom data.
But TAO training gets terminated immediately when I run:
tao action_recognition train -e /workspace/spec/action_recognition_cow.txt -r /workspace/results/cow_activity -k nvidia_tlt
The error message I get is this:
2021-12-20 16:42:05,000 [INFO] root: Registry: ['nvcr.io']
2021-12-20 16:42:05,047 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3
Error executing job with overrides: ['output_dir=/workspace/results/cow_activity', 'encryption_key=nvidia_tlt']
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 368, in <lambda>
lambda: hydra.run(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 110, in run
_ = ret.return_value
File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/action_recognition/scripts/train.py", line 70, in main
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/action_recognition/scripts/train.py", line 22, in run_experiment
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/action_recognition/model/pl_ar_model.py", line 29, in __init__
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/action_recognition/model/pl_ar_model.py", line 36, in _build_model
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/action_recognition/model/build_nn_model.py", line 14, in build_ar_model
AttributeError: 'NoneType' object has no attribute 'keys'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/action_recognition/scripts/train.py", line 76, in <module>
File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/cv/super_resolution/scripts/configs/hydra_runner.py", line 99, in wrapper
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 367, in _run_hydra
run_and_report(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 251, in run_and_report
assert mdl is not None
AssertionError
2021-12-20 16:42:10,017 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
It appears like the problem is to do with data (being treated as NoneType..?). Let me know if you would like to see my config file too, but I cannot understand what I have missed exactly.
You can share your config here.
sure.
I have followed whatever I found under the official documentation, and did manual split of the videos into train and test folder.
model_config:
model_type: rgb
backbone: resnet18
rgb_seq_length: 3
input_type: 2d
sample_rate: 1
dropout_ratio: 0.0
train_config:
optim:
lr: 0.01
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [5, 15, 25]
lr_decay: 0.1
epochs: 30
checkpoint_interval: 1
dataset_config:
train_dataset_dir: /workspace/dataset/cow_activity/train
val_dataset_dir: /workspace/dataset/cow_activity/test
label_map:
eating: 0
lying: 1
standing: 2
walking: 3
output_shape:
- 224
- 224
batch_size: 16
workers: 8
augmentation_config:
train_crop_type: no_crop
horizontal_flip_prob: 0.5
rgb_input_mean: [0.5]
rgb_input_std: [0.5]
val_center_crop: False
Do you see anything off in the config itself? I assumed I cannot use any of the pretrained weights as those available are 5 classes sample based on human activities.
The config looks good. But you should save the config to .yaml
file instead of .txt
ah okay. I thought the file extension was .txt like other models. Thanks, changing to .yaml works perfectly fine.
sorry, one more question.
I am trying to export a test model and integrate it to my deepstream pipeline for testing, and I am facing unusual error here.
tao action_recognition export -k nvidia_tlt \
-e /workspace/spec/action_recognition_cow.yaml \
model=/workspace/cow_activity/ar_model_epoch=09-val_loss=2.15.tlt \
output_file=/workspace/cow_activity/test.etlt \
There should be no problem running above command and get .etlt model, but I am getting:
2021-12-21 13:02:08,726 [INFO] root: Registry: ['nvcr.io']
2021-12-21 13:02:08,770 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3
mismatched input '=' expecting <EOF>
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2021-12-21 13:02:13,324 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
The official documentation says I need to use model=$some_model instead of using commands like -m or -o, but it seems like that's not what the script wants. Any idea on this?
remove the =
in the model name ar_model_epoch=09-val_loss=2.15.tlt
.
It's a known issue we mentioned in notebook:
- "=" in the checkpoint file name should removed before using the checkpoint in command.
We will add this note in the doc.
Ah I see, I think I found another possible bug that should be fixed.
The export command runs, and there is another error coming from the config file that I wanted to point out.
Error merging 'action_recognition_cow.yaml' with schema
Key 'train_config' not in 'ARExportExpConfig'
full_key: train_config
object_type=ARExportExpConfig
The train_config parameter is a default parameter inside the spec file, but it's wanting those to be removed, which seems to be strange. I was able to comment the line out and export the file to etlt, but I am assuming the default behaviour isn't like this.
Emmm, the default behavior is like this. The train_config
is not needed in export phase so we just remove it. You can see there is a export_rgb.yaml
for export in the notebook. But your suggestion is good I think. It will be more friendly if customers can export the model with training.yaml. It contains everything export needs after all.
Got it. thanks.
I encountered some problems when training the HMDB51 data set, and the file e <experiment_spec_file> could not be indexed, and there was always an error that the file does not exist my command is: tao action_recognition train -e /root/tao/resnet18.yaml -r /root/tao/result