Possibly_relative_path
Hello,
I would like to try out Lightning Pose to see if it can improve our SLEAP model.
The installation went well; however, I was unable to run the basic hydra example. See the error below.
I have tried to change the run and the sweep directories in the config_default.yml from scripts/outputs/${now:%Y-%m-%d}/${now:%H-%M-%S} to /om2/user/gosztola/calibration/lightning-pose/scripts/outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}, but didn't help.
Could you please help?
(lightning-pose) -bash-4.2$ python train_hydra.py --config-path=/om2/user/gosztola/calibration/lightning-pose/scripts/configs --config-name=config_default.yaml
[2024-09-02 12:03:27,497][HYDRA] /om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
Our Hydra config file:
--------------------
data parameters
--------------------
image_resize_dims: {'height': None, 'width': None}
data_dir: None
video_dir: None
csv_file: CollectedData.csv
downsample_factor: 2
num_keypoints: None
keypoint_names: None
dynamic_crop: False
num_max_instances: 1
detector: {'image_resize_dims': {'height': None, 'width': None}, 'downsample_factor': 2, 'keypoints_for_crop': None, 'num_keypoints': None, 'keypoint_names': None}
mirrored_column_matches: None
columns_for_singleview_pca: None
--------------------
training parameters
--------------------
imgaug: dlc
train_batch_size: 16
val_batch_size: 32
test_batch_size: 32
train_prob: 0.95
val_prob: 0.05
train_frames: 1
num_gpus: 1
num_workers: 4
unfreezing_epoch: 20
min_epochs: 300
max_epochs: 300
log_every_n_steps: 10
check_val_every_n_epoch: 5
ckpt_every_n_epochs: None
early_stopping: False
early_stop_patience: 3
gpu_id: 0
rng_seed_data_pt: 0
rng_seed_model_pt: 0
lr_scheduler: multisteplr
lr_scheduler_params: {'multisteplr': {'milestones': [150, 200, 250], 'gamma': 0.5}}
--------------------
model parameters
--------------------
losses_to_use: []
backbone: resnet50_animal_ap10k
model_type: heatmap
heatmap_loss_type: mse
model_name: test
checkpoint: None
lightning_pose_version: 1.5.1
--------------------
dali parameters
--------------------
general: {'seed': 123456}
base: {'train': {'sequence_length': 32}, 'predict': {'sequence_length': 96}}
context: {'train': {'batch_size': 16}, 'predict': {'sequence_length': 96}}
--------------------
losses parameters
--------------------
pca_multiview: {'log_weight': 5.0, 'components_to_keep': 3, 'epsilon': None}
pca_singleview: {'log_weight': 5.0, 'components_to_keep': 0.99, 'epsilon': None}
temporal: {'log_weight': 5.0, 'epsilon': 20.0, 'prob_threshold': 0.05}
--------------------
callbacks parameters
--------------------
anneal_weight: {'attr_name': 'total_unsupervised_importance', 'init_val': 0.0, 'increase_factor': 0.01, 'final_val': 1.0, 'freeze_until_epoch': 0}
Error executing job with overrides: []
Traceback (most recent call last):
File "/net/vast-storage/scratch/vast/jazayeri/gosztola/calibration/lightning-pose/scripts/train_hydra.py", line 35, in train_model
train(cfg)
File "/om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/typeguard/__init__.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/lightning_pose/train.py", line 56, in train
data_dir, video_dir = return_absolute_data_paths(data_cfg=cfg.data)
File "/om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/typeguard/__init__.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/lightning_pose/utils/io.py", line 168, in return_absolute_data_paths
data_dir = return_absolute_path(data_cfg.data_dir, n_dirs_back=n_dirs_back)
File "/om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/typeguard/__init__.py", line 1032, in wrapper
check_argument_types(memo)
File "/om2/user/gosztola/miniconda3/envs/lightning-pose/lib/python3.10/site-packages/typeguard/__init__.py", line 875, in check_argument_types
raise TypeError(*exc.args) from None
TypeError: type of argument "possibly_relative_path" must be str; got NoneType instead
Ok, I have discovered that the issue was that the train_hydra.py can only be run from inside the main directory.
As a suggestion: would it make sense to overwrite the relative paths when they are provided as an argument?
@agosztolai glad you were able to solve the issue. Yes I'll take a look and see if I can overwrite the relative paths; there may be some other path issues that crop up instead. But at the very least we can throw a more descriptive error message that says you need to move into the lightning-pose directory.
@agosztolai Looked into this further and the reason for the error was that data_dir was null. I verified that after setting this parameter with an absolute path, the train_hydra.py script can be run from any location.
@agosztolai somewhat related, #217 is out to allow train_hydra to run on the mirror-mouse-example from any directory.
Please reach out with any feedback!