Questions about Multiview: separate data streams
This is a project I made with DEEPLABCUT before. How should I convert it into a Lightning-pose project? I encountered the following error when I tried to use Lightning-pose for training. Here is some of my configuration file information. Can you help me see how to change it? I think it should be a problem with the .csv file but I don’t know how to change it.
Below is my file structure and configuration file
multiview-fish.zip
data:
resize dimensions to streamline model creation
image_resize_dims: height: 384 width: 384
ABSOLUTE path to data directory
data_dir: /home/WYC/multiview-fish/
ABSOLUTE path to unlabeled videos' directory
video_dir: videos
location of labels; this should be relative to data_dir
csv_file: - view0.csv - view1.csv - view2.csv view_name: - view0 - view1 - view2
downsample heatmaps - 2 | 3
downsample_factor: 2
total number of keypoints
num_keypoints: 24
keypoint names
keypoint_names: - fish_head - fish_eye_r - fish_eye_l - dorsal_fin0 - dorsal_fin1 - dorsal_fin2 - dorsal_fin3 - pectoral_tail_root_r - pectoral_tail_up_r - pectoral_tail_middle_r - pectoral_tail_down_r - pectoral_tail_root_l - pectoral_tail_up_l - pectoral_tail_middle_l - pectoral_tail_down_l - fish_body_r - fish_body_l - fish_tail - tail_fin_up - tail_fin_middle - tail_fin_down - x - y - z
for mirrored setups with all keypoints defined in same csv file, define matching
columns for different keypoints (assumes x-y-x-y interleaving)
each list corresponds to a single view, so in the example below there are 2 views
keypoint 0 is from view 0 and matches up with keypoint 8 from view 2
columns that correspond to keypoints only labeled in a single view are omitted
this info is only used for the multiview pca loss
mirrored_column_matches: - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
list of indices of keypoints used for pca singleview loss (use order of labels file)
columns_for_singleview_pca: NOT YET IMPLEMENTED
training:
select from one of several predefined image/video augmentation pipelines
default- resizing only
dlc- imgaug pipeline implemented in DLC 2.0 package
dlc-top-down- dlc augmentations plus vertical and horizontal flips
imgaug: dlc
batch size of labeled data during training
train_batch_size: 8
batch size of labeled data during validation
val_batch_size: 8
batch size of labeled data during test
test_batch_size: 8
fraction of labeled data used for training
train_prob: 0.95
fraction of labeled data used for validation (remaining used for test)
val_prob: 0.05
<=1 - fraction of total train frames (determined by train_prob) used for training
>1 - number of total train frames used for training
train_frames: 1
number of gpus to train a single model
num_gpus: 2
number of cpu workers for data loaders
num_workers: 4
epoch at which backbone network weights begin updating
unfreezing_epoch: 20
max training epochs; training may exit before due to early stopping
min_epochs: 300 max_epochs: 300
frequency to log training metrics for tensorboard (one step is one batch)
log_every_n_steps: 10
frequency to log validation metrics for tensorboard
check_val_every_n_epoch: 5
save model weights every n epochs; must be divisible by check_val_every_n_epoch above
if null, only best weights will be saved after training
ckpt_every_n_epochs: null
perform early stopping; if this is false, the default is to train for the max number of epochs
and save out the best model according to validation loss
early_stopping: false
epochs over which to assess validation metrics for early stopping
early_stop_patience: 3
select gpu for training
gpu_id: 0
rng seed for labeled batches
rng_seed_data_pt: 0
rng seed for weight initialization
rng_seed_model_pt: 0
learning rate scheduler
multisteplr | [todo - reducelronplateau]
lr_scheduler: multisteplr lr_scheduler_params: multisteplr: milestones: [150, 200, 250] gamma: 0.5
model:
list of unsupervised losses
"pca_singleview" | "pca_multiview" | "temporal" | "unimodal_mse" | "unimodal_kl"
losses_to_use: [temporal]
backbone network:
resnet18 | resnet34 | resnet50 | resnet101 | resnet152 | resnet50_contrastive
resnet50_animal_apose | resnet50_animal_ap10k
resnet50_human_jhmdb | resnet50_human_res_rle | resnet50_human_top_res | resnet50_human_hand
efficientnet_b0 | efficientnet_b1 | efficientnet_b2
vit_b_sam | vit_h_sam
backbone: resnet50
prediction mode: regression | heatmap | heatmap_mhcrnn (context)
model_type: heatmap
which heatmap loss to use
mse | kl | js
heatmap_loss_type: mse
directory name for model saving
model_name: test
load model from checkpoint
checkpoint: null
dali: general: seed: 123456
base: train: sequence_length: 16 predict: sequence_length: 32
context: train: batch_size: 8 predict: sequence_length: 16
losses:
loss = projection onto the discarded eigenvectors
pca_multiview: # weight in front of PCA loss log_weight: 5.0 # predictions should lie within the low-d subspace spanned by these components components_to_keep: 3 # absolute error (in pixels) below which pca loss is zeroed out; if null, an empirical # epsilon is computed using the labeled data epsilon: null
loss = projection onto the discarded eigenvectors
pca_singleview: # weight in front of PCA loss log_weight: 5.0 # predictions should lie within the low-d subspace spanned by components that describe this fraction of variance components_to_keep: 0.99 # absolute error (in pixels) below which pca loss is zeroed out; if null, an empirical # epsilon is computed using the labeled data epsilon: null
loss = norm of distance between successive timepoints
temporal: # weight in front of temporal loss log_weight: 5.0 # for epsilon insensitive rectification # (in pixels; diffs below this are not penalized) epsilon: 10.0 # nan removal value. # (in prob; heatmaps with max prob values are removed) prob_threshold: 0.05
eval:
paths to the hydra config files in the output folder, OR absolute paths to such folders.
used in scripts/predict_new_vids.py and scripts/create_fiftyone_dataset.py
hydra_paths: [""]
predict? used in scripts/train_hydra.py
predict_vids_after_training: false
save labeled .mp4? used in scripts/train_hydra.py and scripts/predict_new_vids.py
save_vids_after_training: false fiftyone: # will be the name of the dataset (Mongo DB) created by FiftyOne. for video dataset, we will append dataset_name + "_video" dataset_name: test # if you want to manually provide a different model name to be displayed in FiftyOne model_display_names: ["test_model"] # whether to launch the app from the script (True), or from ipython (and have finer control over the outputs) launch_app_from_script: false
remote: true # for LAI, must be False
address: 127.0.0.1 # ip to launch the app on.
port: 5151 # port to launch the app on.
str with an absolute path to a directory containing videos for prediction.
set to null to skip automatic video prediction from train_hydra.py script
used in scripts/train_hydra.py and scripts/predict_new_vids.py
test_videos_directory: /home/WYC/multiview-fish/videos
confidence threshold for plotting a vid
confidence_thresh_for_vid: 0.90
callbacks: anneal_weight: attr_name: total_unsupervised_importance init_val: 0.0 increase_factor: 0.01 final_val: 1.0 freeze_until_epoch: 0
hydra: run: dir: outputs/${now:%Y-%m-%d}/${now:%H-%M-%S} sweep: dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S} subdir: ${hydra.job.num}
@wyclearnpy it looks like there is an issue with how you have organized/labeled your data. The multiview option requires labels from all views at a given time point. So for example, you have a frame named labeled-data/koipose0_camA/img009.png in view0.csv. In order to use the multiview option, you would also need the corresponding frame labeled-data/koipose0_camB/img009.png in view1.csv and labeled-data/koipose0_camC/img009.png in view2.csv. Note that all three frames are img009.
In your case you don't have labels from corresponding views, so you can just train a "singleview" model that doesn't explicitly take into account the multiview nature of the data. To do so, just make a single csv file that contains all of your labeled frames from all views, and train a standard model. This will result in a view-invariant model that you can then use to run inference on videos from any view. This is what the DLC pipeline you were previously using would have been doing.
If you would like to use the multiview feature then you'll need to label corresponding frames - if you'd like to go down this route I'd recommend the Anivia image labeler, which allows you to easily label 3D datasets: https://allenneuraldynamics.github.io/anivia-docs/
Please let me know if you have any other questions!
Thank you for your reply. I will give it a try. The previous datasets images were obtained using random sampling, so the label files between different cameras do not match.
Great, you should still be able to get comparable performance to DLC even without the multiview losses. You might also try to extract the context frames for each of your labeled frames in order to train a context model. This might be beneficial with fish if there are brief occlusions due to fins moving around, or brief distortions due to the water.
It seems that there are still some problems. I have corrected the data set, but there are still errors.
@wyclearnpy I'm not able to open the image that you linked above. Can you copy/paste the error from the command line here?
Error executing job with overrides: [] Traceback (most recent call last): File "/home/WYC/lightning-pose/scripts/train_hydra.py", line 35, in train_model train(cfg) File "/home/WYC/.conda/envs/LP/lib/python3.10/site-packages/typeguard/init.py", line 1033, in wrapper retval = func(*args, **kwargs) File "/home/WYC/lightning-pose/lightning_pose/train.py", line 66, in train dataset = get_dataset(cfg=cfg, data_dir=data_dir, imgaug_transform=imgaug_transform) File "/home/WYC/.conda/envs/LP/lib/python3.10/site-packages/typeguard/init.py", line 1033, in wrapper retval = func(*args, **kwargs) File "/home/WYC/lightning-pose/lightning_pose/utils/scripts.py", line 114, in get_dataset dataset = HeatmapDataset( File "/home/WYC/lightning-pose/lightning_pose/data/datasets.py", line 245, in init super().init( File "/home/WYC/lightning-pose/lightning_pose/data/datasets.py", line 73, in init if os.path.isfile(csv_path): File "/home/WYC/.conda/envs/LP/lib/python3.10/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not ListConfig
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
It appears you are still using the 3 csv files (I think). You'll need to concatenate them into a single csv file and then use that name in the data.csv_file field of the config file
Yes, but wouldn't that be a single view? I want to do multi-view training, I see lightning-pose documentation in multi-view training, there is a separate csv file for each view
yes, I think the nomenclature here is a bit confusing. The "multi-view" training referred to in the LP documentation specifically means training on labels present from all views at a single point in time. The standard "single view" model means the model does not take into account explicit correspondences between views (just like DLC and SLEAP do not take these correspondences into account). So in your case if you create a single csv file the resulting model will be view-invariant, i.e. you can feed a frame from any of your views into the model and it will (should) produce good predictions - just as you're doing with DLC now.
Okay, but I still want to know how to enable multi-view training, because I want to perform detection in 3D. This problem appeared in #120, but I don't know how to solve it.
There are two options:
-
use your current labeled frames to train a "single view" model, which, because it is trained on frames from multiple cameras, will actually be a view-agnostic model. After the model is trained, you can then take videos from a session and process each of them individually. This will result in a set of predictions per view, which you can then fuse into 3D pose estimates using a tool like anipose. This is a pretty standard setup, for example this is exactly what 3D SLEAP does.
-
label a new set of frames (for example using the anivia labeler I linked above) such that you have labels on every single frame for a given point in time. At that point you can use the multi-view version of LP. I'll note that if you train a supervised multi-view LP model it is equivalent to the "single view" model that I described above. Only when you turn on the unsupervised losses does the LP model start to use the correspondences between frames during model training.
How many labeled frames do you have currently? The main question is whether or not you want to do another round of labeling.
@wyclearnpy just wanted to check in to see if you're all set here - if so I'll close the issue
