FAST-VQA-and-FasterVQA icon indicating copy to clipboard operation
FAST-VQA-and-FasterVQA copied to clipboard

gpu error

Open likezjuisee opened this issue 2 years ago • 9 comments

from fastvqa import deep_end_to_end_vqa

import torch dum_video = torch.randn((3,240,720,1080)) model_type="fast" vqa = deep_end_to_end_vqa(True, model_type=model_type) [True, True, True, False] /home/saman/miniconda3/envs/fast_vqa/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2895.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Successfully loaded pretrained=[True] fast-vqa model from pretrained_path=[pretrained_weights/fast_vqa_v0_3.pth]. Please make sure the input is [torch.tensor] in [(C,T,H,W)] layout and with data range [0,1]. vqa = deep_end_to_end_vqa(True, model_type=model_type, device="cuda:1") [True, True, True, False] /home/saman/miniconda3/envs/fast_vqa/lib/python3.8/site-packages/torch/cuda/init.py:146: UserWarning: NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Successfully loaded pretrained=[True] fast-vqa model from pretrained_path=[pretrained_weights/fast_vqa_v0_3.pth]. Please make sure the input is [torch.tensor] in [(C,T,H,W)] layout and with data range [0,1].

vqa(dum_video) Traceback (most recent call last): File "", line 1, in File "/home/saman/Projects/FAST-VQA/fastvqa/apis/fast_vqa_model.py", line 77, in call x = ((x.permute(1, 2, 3, 0) - self.mean) / self.std).permute(3, 0, 1, 2) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!

likezjuisee avatar Aug 01 '22 09:08 likezjuisee

You may to refer to https://discuss.pytorch.org/t/trouble-with-cuda-capability-sm-86/152974 to solve this problem.

teowu avatar Aug 03 '22 17:08 teowu

See if conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch works for you.

teowu avatar Aug 03 '22 17:08 teowu

Or pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113.

teowu avatar Aug 03 '22 17:08 teowu

Uploading image.png… My cudatoolkit version is 11.7, is that too high? Which version do you use?

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 0% 43C P0 43W / 170W | 0MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A | | 0% 42C P0 43W / 170W | 0MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

likezjuisee avatar Aug 12 '22 05:08 likezjuisee

If I want to load a mp4 video file, and inference by your models, how to do. I am not familiar with the Pytorch, could you show me the code?

likezjuisee avatar Aug 12 '22 05:08 likezjuisee

You might need to re-install pytorch based on your CUDA version, but based on my knowledge, the CUDA11.7 does not have corresponding Pytorch right now. You may try with Pytorch built for CUDA11.6. For the one-step inference code, we are working on revising it, and will soon release the new version on it.

teowu avatar Aug 16 '22 05:08 teowu

look forward for your "one-step inference code"!

likezjuisee avatar Aug 16 '22 05:08 likezjuisee

look forward for your "one-step inference code"! This is done. Please clone the newest version of the dev branch and run python vqa.py to use.

teowu avatar Aug 16 '22 14:08 teowu

coredump

from fastvqa.models import DiViDeAddEvaluator device = "cuda" DiViDeAddEvaluator(**opt["model"]["args"]).to("cuda") Traceback (most recent call last): File "", line 1, in TypeError: string indices must be integers config './options/fast/f3dvqa-b.yml' f = open(config, "r") opt = yaml.safe_load(f) opt {'name': 'Space_Time_Unified_FAST(3D)_11', 'num_epochs': 30, 'l_num_epochs': 0, 'warmup_epochs': 2.5, 'ema': True, 'save_model': True, 'batch_size': 16, 'num_workers': 6, 'wandb': {'project_name': 'VQA_Experiments_2022'}, 'data': {'train': {'type': 'FusionDataset', 'args': {'phase': 'train', 'anno_file': './examplar_data_labels/train_labels.txt', 'data_prefix': '../datasets/LSVQ', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-livevqc': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LIVE_VQC/labels.txt', 'data_prefix': '../datasets/LIVE_VQC/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-kv1k': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/KoNViD/labels.txt', 'data_prefix': '../datasets/KoNViD/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-ltest': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LSVQ/labels_test.txt', 'data_prefix': '../datasets/LSVQ/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-l1080p': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LSVQ/labels_1080p.txt', 'data_prefix': '../datasets/LSVQ/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}}, 'model': {'type': 'DiViDeAddEvaluator', 'args': {'backbone': {'fragments': {'checkpoint': False, 'pretrained': None}}, 'backbone_size': 'swin_tiny_grpb', 'backbone_preserve_keys': 'fragments', 'divide_head': False, 'vqa_head': {'in_channels': 768, 'hidden_channels': 64}}}, 'optimizer': {'lr': 0.001, 'backbone_lr_mult': 0.1, 'wd': 0.05}, 'load_path': '../pretrained/swin_tiny_patch244_window877_kinetics400_1k.pth', 'test_load_path': './pretrained_weights/FAST_VQA_3D_11.pth'} opt["model"]["args"] {'backbone': {'fragments': {'checkpoint': False, 'pretrained': None}}, 'backbone_size': 'swin_tiny_grpb', 'backbone_preserve_keys': 'fragments', 'divide_head': False, 'vqa_head': {'in_channels': 768, 'hidden_channels': 64}} DiViDeAddEvaluator(**opt["model"]["args"]).to("cuda") (8, 7, 7) /home/hzqard/miniconda3/envs/FAST-VQA/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] (8, 7, 7) (8, 7, 7) (8, 7, 7) None False Setting backbone: fragments_backbone Segmentation fault (core dumped)

likezjuisee avatar Aug 18 '22 01:08 likezjuisee

I'm having the same core dump issue

GFiz avatar Sep 14 '22 13:09 GFiz

Hi, this might be due to overflow of memory. You may check on your devices' memory and decrease the num_workers in the option file to avoid this.

teowu avatar Sep 28 '22 08:09 teowu

Hi, this might be due to overflow of memory. You may check on your devices' memory and decrease the num_workers in the option file to avoid this.

I have changed the num_workers to 1, and also the same error.

name: Space_Time_Unified_FAST(3D)_1*1 num_epochs: 30 l_num_epochs: 0 warmup_epochs: 2.5 ema: true save_model: true batch_size: 16 num_workers: 1

likezjuisee avatar Sep 30 '22 05:09 likezjuisee

Hi may I know your device info? I am trying to replicate and locate the error. Best, Haoning

teowu avatar Oct 01 '22 13:10 teowu

image

Python 3.8.8

(FAST-VQA) hzqard@saman2:~/project/FAST-VQA-and-FasterVQA$ pip list | grep torch torch 1.10.2+cu113 torchvision 0.11.3+cu113

likezjuisee avatar Oct 17 '22 09:10 likezjuisee

Changing the batch size to 4, helped me.

dmumtaz avatar Jan 17 '23 11:01 dmumtaz