jepa
jepa copied to clipboard
Crashes after first epoch because of leaked semaphores
Hi, I am running an evaluation on a small dataset (train dataset of 22 labeled videos and val dataset of 2 labeled videos.). It crashes after the first epoch after my RAM gets maxed out.
Error received: /opt/conda/envs/jepa-p10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 88 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Here is config file:
{ 'data': { 'dataset_train': '/home/ubuntu/dev/jepa/val_dataset.csv',
'dataset_type': 'VideoDataset',
'dataset_val': '/home/ubuntu/dev/jepa/train_dataset.csv',
'frame_step': 4,
'frames_per_clip': 16,
'num_classes': 2,
'num_segments': 2,
'num_views_per_segment': 3},
'eval_name': 'video_classification_frozen',
'nodes': 1,
'optimization': { 'attend_across_segments': True,
'batch_size': 1,
'final_lr': 0.0,
'lr': 0.001,
'num_epochs': 20,
'resolution': 224,
'start_lr': 0.001,
'use_bfloat16': True,
'warmup': 0.0,
'weight_decay': 0.01},
'pretrain': { 'checkpoint': 'vitl16.pth.tar',
'checkpoint_key': 'target_encoder',
'clip_duration': None,
'folder': './',
'frames_per_clip': 16,
'model_name': 'vit_large',
'patch_size': 16,
'tight_silu': False,
'tubelet_size': 2,
'uniform_power': True,
'use_sdpa': True,
'use_silu': False,
'write_tag': 'jepa'},
'resume_checkpoint': False,
'tag': 'ssv2-16x2x3',
'tasks_per_node': 1}
Please assist.