lightning-pose icon indicating copy to clipboard operation
lightning-pose copied to clipboard

Questions about Multiview: separate data streams

Open wyclearnpy opened this issue 1 year ago • 14 comments

This is a project I made with DEEPLABCUT before. How should I convert it into a Lightning-pose project? I encountered the following error when I tried to use Lightning-pose for training. Here is some of my configuration file information. Can you help me see how to change it? I think it should be a problem with the .csv file but I don’t know how to change it. Uploading FIG.png… Below is my file structure and configuration file multiview-fish.zip

data:

resize dimensions to streamline model creation

image_resize_dims: height: 384 width: 384

ABSOLUTE path to data directory

data_dir: /home/WYC/multiview-fish/

ABSOLUTE path to unlabeled videos' directory

video_dir: videos

location of labels; this should be relative to data_dir

csv_file: - view0.csv - view1.csv - view2.csv view_name: - view0 - view1 - view2

downsample heatmaps - 2 | 3

downsample_factor: 2

total number of keypoints

num_keypoints: 24

keypoint names

keypoint_names: - fish_head - fish_eye_r - fish_eye_l - dorsal_fin0 - dorsal_fin1 - dorsal_fin2 - dorsal_fin3 - pectoral_tail_root_r - pectoral_tail_up_r - pectoral_tail_middle_r - pectoral_tail_down_r - pectoral_tail_root_l - pectoral_tail_up_l - pectoral_tail_middle_l - pectoral_tail_down_l - fish_body_r - fish_body_l - fish_tail - tail_fin_up - tail_fin_middle - tail_fin_down - x - y - z

for mirrored setups with all keypoints defined in same csv file, define matching

columns for different keypoints (assumes x-y-x-y interleaving)

each list corresponds to a single view, so in the example below there are 2 views

keypoint 0 is from view 0 and matches up with keypoint 8 from view 2

columns that correspond to keypoints only labeled in a single view are omitted

this info is only used for the multiview pca loss

mirrored_column_matches: - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

list of indices of keypoints used for pca singleview loss (use order of labels file)

columns_for_singleview_pca: NOT YET IMPLEMENTED

training:

select from one of several predefined image/video augmentation pipelines

default- resizing only

dlc- imgaug pipeline implemented in DLC 2.0 package

dlc-top-down- dlc augmentations plus vertical and horizontal flips

imgaug: dlc

batch size of labeled data during training

train_batch_size: 8

batch size of labeled data during validation

val_batch_size: 8

batch size of labeled data during test

test_batch_size: 8

fraction of labeled data used for training

train_prob: 0.95

fraction of labeled data used for validation (remaining used for test)

val_prob: 0.05

<=1 - fraction of total train frames (determined by train_prob) used for training

>1 - number of total train frames used for training

train_frames: 1

number of gpus to train a single model

num_gpus: 2

number of cpu workers for data loaders

num_workers: 4

epoch at which backbone network weights begin updating

unfreezing_epoch: 20

max training epochs; training may exit before due to early stopping

min_epochs: 300 max_epochs: 300

frequency to log training metrics for tensorboard (one step is one batch)

log_every_n_steps: 10

frequency to log validation metrics for tensorboard

check_val_every_n_epoch: 5

save model weights every n epochs; must be divisible by check_val_every_n_epoch above

if null, only best weights will be saved after training

ckpt_every_n_epochs: null

perform early stopping; if this is false, the default is to train for the max number of epochs

and save out the best model according to validation loss

early_stopping: false

epochs over which to assess validation metrics for early stopping

early_stop_patience: 3

select gpu for training

gpu_id: 0

rng seed for labeled batches

rng_seed_data_pt: 0

rng seed for weight initialization

rng_seed_model_pt: 0

learning rate scheduler

multisteplr | [todo - reducelronplateau]

lr_scheduler: multisteplr lr_scheduler_params: multisteplr: milestones: [150, 200, 250] gamma: 0.5

model:

list of unsupervised losses

"pca_singleview" | "pca_multiview" | "temporal" | "unimodal_mse" | "unimodal_kl"

losses_to_use: [temporal]

backbone network:

resnet18 | resnet34 | resnet50 | resnet101 | resnet152 | resnet50_contrastive

resnet50_animal_apose | resnet50_animal_ap10k

resnet50_human_jhmdb | resnet50_human_res_rle | resnet50_human_top_res | resnet50_human_hand

efficientnet_b0 | efficientnet_b1 | efficientnet_b2

vit_b_sam | vit_h_sam

backbone: resnet50

prediction mode: regression | heatmap | heatmap_mhcrnn (context)

model_type: heatmap

which heatmap loss to use

mse | kl | js

heatmap_loss_type: mse

directory name for model saving

model_name: test

load model from checkpoint

checkpoint: null

dali: general: seed: 123456

base: train: sequence_length: 16 predict: sequence_length: 32

context: train: batch_size: 8 predict: sequence_length: 16

losses:

loss = projection onto the discarded eigenvectors

pca_multiview: # weight in front of PCA loss log_weight: 5.0 # predictions should lie within the low-d subspace spanned by these components components_to_keep: 3 # absolute error (in pixels) below which pca loss is zeroed out; if null, an empirical # epsilon is computed using the labeled data epsilon: null

loss = projection onto the discarded eigenvectors

pca_singleview: # weight in front of PCA loss log_weight: 5.0 # predictions should lie within the low-d subspace spanned by components that describe this fraction of variance components_to_keep: 0.99 # absolute error (in pixels) below which pca loss is zeroed out; if null, an empirical # epsilon is computed using the labeled data epsilon: null

loss = norm of distance between successive timepoints

temporal: # weight in front of temporal loss log_weight: 5.0 # for epsilon insensitive rectification # (in pixels; diffs below this are not penalized) epsilon: 10.0 # nan removal value. # (in prob; heatmaps with max prob values are removed) prob_threshold: 0.05

eval:

paths to the hydra config files in the output folder, OR absolute paths to such folders.

used in scripts/predict_new_vids.py and scripts/create_fiftyone_dataset.py

hydra_paths: [""]

predict? used in scripts/train_hydra.py

predict_vids_after_training: false

save labeled .mp4? used in scripts/train_hydra.py and scripts/predict_new_vids.py

save_vids_after_training: false fiftyone: # will be the name of the dataset (Mongo DB) created by FiftyOne. for video dataset, we will append dataset_name + "_video" dataset_name: test # if you want to manually provide a different model name to be displayed in FiftyOne model_display_names: ["test_model"] # whether to launch the app from the script (True), or from ipython (and have finer control over the outputs) launch_app_from_script: false

remote: true # for LAI, must be False
address: 127.0.0.1 # ip to launch the app on.
port: 5151 # port to launch the app on.

str with an absolute path to a directory containing videos for prediction.

set to null to skip automatic video prediction from train_hydra.py script

used in scripts/train_hydra.py and scripts/predict_new_vids.py

test_videos_directory: /home/WYC/multiview-fish/videos

confidence threshold for plotting a vid

confidence_thresh_for_vid: 0.90

callbacks: anneal_weight: attr_name: total_unsupervised_importance init_val: 0.0 increase_factor: 0.01 final_val: 1.0 freeze_until_epoch: 0

hydra: run: dir: outputs/${now:%Y-%m-%d}/${now:%H-%M-%S} sweep: dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S} subdir: ${hydra.job.num}

wyclearnpy avatar Oct 08 '24 02:10 wyclearnpy

FIG

wyclearnpy avatar Oct 08 '24 02:10 wyclearnpy

@wyclearnpy it looks like there is an issue with how you have organized/labeled your data. The multiview option requires labels from all views at a given time point. So for example, you have a frame named labeled-data/koipose0_camA/img009.png in view0.csv. In order to use the multiview option, you would also need the corresponding frame labeled-data/koipose0_camB/img009.png in view1.csv and labeled-data/koipose0_camC/img009.png in view2.csv. Note that all three frames are img009.

In your case you don't have labels from corresponding views, so you can just train a "singleview" model that doesn't explicitly take into account the multiview nature of the data. To do so, just make a single csv file that contains all of your labeled frames from all views, and train a standard model. This will result in a view-invariant model that you can then use to run inference on videos from any view. This is what the DLC pipeline you were previously using would have been doing.

If you would like to use the multiview feature then you'll need to label corresponding frames - if you'd like to go down this route I'd recommend the Anivia image labeler, which allows you to easily label 3D datasets: https://allenneuraldynamics.github.io/anivia-docs/

Please let me know if you have any other questions!

themattinthehatt avatar Oct 08 '24 13:10 themattinthehatt

Thank you for your reply. I will give it a try. The previous datasets images were obtained using random sampling, so the label files between different cameras do not match.

wyclearnpy avatar Oct 08 '24 13:10 wyclearnpy

Great, you should still be able to get comparable performance to DLC even without the multiview losses. You might also try to extract the context frames for each of your labeled frames in order to train a context model. This might be beneficial with fish if there are brief occlusions due to fins moving around, or brief distortions due to the water.

themattinthehatt avatar Oct 08 '24 13:10 themattinthehatt

It seems that there are still some problems. I have corrected the data set, but there are still errors.

如图

wyclearnpy avatar Oct 09 '24 08:10 wyclearnpy

view1.csv view2.csv view0.csv These are three label files

wyclearnpy avatar Oct 09 '24 08:10 wyclearnpy

@wyclearnpy I'm not able to open the image that you linked above. Can you copy/paste the error from the command line here?

themattinthehatt avatar Oct 09 '24 12:10 themattinthehatt

Error executing job with overrides: [] Traceback (most recent call last): File "/home/WYC/lightning-pose/scripts/train_hydra.py", line 35, in train_model train(cfg) File "/home/WYC/.conda/envs/LP/lib/python3.10/site-packages/typeguard/init.py", line 1033, in wrapper retval = func(*args, **kwargs) File "/home/WYC/lightning-pose/lightning_pose/train.py", line 66, in train dataset = get_dataset(cfg=cfg, data_dir=data_dir, imgaug_transform=imgaug_transform) File "/home/WYC/.conda/envs/LP/lib/python3.10/site-packages/typeguard/init.py", line 1033, in wrapper retval = func(*args, **kwargs) File "/home/WYC/lightning-pose/lightning_pose/utils/scripts.py", line 114, in get_dataset dataset = HeatmapDataset( File "/home/WYC/lightning-pose/lightning_pose/data/datasets.py", line 245, in init super().init( File "/home/WYC/lightning-pose/lightning_pose/data/datasets.py", line 73, in init if os.path.isfile(csv_path): File "/home/WYC/.conda/envs/LP/lib/python3.10/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not ListConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

wyclearnpy avatar Oct 09 '24 12:10 wyclearnpy

It appears you are still using the 3 csv files (I think). You'll need to concatenate them into a single csv file and then use that name in the data.csv_file field of the config file

themattinthehatt avatar Oct 09 '24 17:10 themattinthehatt

Yes, but wouldn't that be a single view? I want to do multi-view training, I see lightning-pose documentation in multi-view training, there is a separate csv file for each view

wyclearnpy avatar Oct 10 '24 01:10 wyclearnpy

yes, I think the nomenclature here is a bit confusing. The "multi-view" training referred to in the LP documentation specifically means training on labels present from all views at a single point in time. The standard "single view" model means the model does not take into account explicit correspondences between views (just like DLC and SLEAP do not take these correspondences into account). So in your case if you create a single csv file the resulting model will be view-invariant, i.e. you can feed a frame from any of your views into the model and it will (should) produce good predictions - just as you're doing with DLC now.

themattinthehatt avatar Oct 10 '24 17:10 themattinthehatt

Okay, but I still want to know how to enable multi-view training, because I want to perform detection in 3D. This problem appeared in #120, but I don't know how to solve it.

wyclearnpy avatar Oct 11 '24 07:10 wyclearnpy

There are two options:

  1. use your current labeled frames to train a "single view" model, which, because it is trained on frames from multiple cameras, will actually be a view-agnostic model. After the model is trained, you can then take videos from a session and process each of them individually. This will result in a set of predictions per view, which you can then fuse into 3D pose estimates using a tool like anipose. This is a pretty standard setup, for example this is exactly what 3D SLEAP does.

  2. label a new set of frames (for example using the anivia labeler I linked above) such that you have labels on every single frame for a given point in time. At that point you can use the multi-view version of LP. I'll note that if you train a supervised multi-view LP model it is equivalent to the "single view" model that I described above. Only when you turn on the unsupervised losses does the LP model start to use the correspondences between frames during model training.

How many labeled frames do you have currently? The main question is whether or not you want to do another round of labeling.

themattinthehatt avatar Oct 11 '24 13:10 themattinthehatt

@wyclearnpy just wanted to check in to see if you're all set here - if so I'll close the issue

themattinthehatt avatar Oct 24 '24 21:10 themattinthehatt