Scene-Graph-Benchmark.pytorch icon indicating copy to clipboard operation
Scene-Graph-Benchmark.pytorch copied to clipboard

Error when installing maskrcnn_benchmark

Open draym28 opened this issue 10 months ago • 18 comments

Some common problems & solutions when installing maskrcnn_benchmark.

1. THC.h: No such file or directory/THCeilDiv Undefined/ see this

2. identifier "THCudaCheck" is undefined see this

3. torch.utils.cpp_extension.load stuck see this

draym28 avatar Apr 03 '24 05:04 draym28

Hi,

The version of the code in this repo is very outdated and is indeed not up-to-date with current CUDA standards. I fixed all of those issues in my implementation, you can probably copy the csrc folder into your local path and be able to compile without any issues (I tested it with CUDA version 11+): https://github.com/Maelic/SGG-Benchmark/tree/main/sgg_benchmark/csrc

Best

Maelic avatar Apr 03 '24 13:04 Maelic

Hi,

The version of the code in this repo is very outdated and is indeed not up-to-date with current CUDA standards. I fixed all of those issues in my implementation, you can probably copy the csrc folder into your local path and be able to compile without any issues (I tested it with CUDA version 11+): https://github.com/Maelic/SGG-Benchmark/tree/main/sgg_benchmark/csrc

Best

Thanks for your help! But after using your csrc, when I conduct SGDet on Custom Images following the instruction in README.md, other errors still comes up:

D:\App\Anaconda3\envs\sgg\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: 'cp1' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
D:\App\Anaconda3\envs\sgg\lib\site-packages\apex\__init__.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use [PyTorch AMP](https://pytorch.org/docs/stable/amp.html)
  warnings.warn(msg, DeprecatedFeatureWarning)
Traceback (most recent call last):
  File "tools/relation_test_net.py", line 11, in <module>
    from maskrcnn_benchmark.data import make_data_loader
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\__init__.py", line 2, in <module>
    from .build import make_data_loader, get_dataset_statistics
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\build.py", line 14, in <module>
    from . import datasets as D
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\datasets\__init__.py", line 2, in <module>
    from .coco import COCODataset
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\datasets\coco.py", line 39, in <module>
    class COCODataset(torchvision.datasets.coco.CocoDetection):
AttributeError: module 'torchvision' has no attribute 'datasets'

I still stuck on this step. It makes me crazy.

draym28 avatar Apr 04 '24 11:04 draym28

Which version of torchvision are you using?

Maelic avatar Apr 04 '24 12:04 Maelic

It works for me with torchvision 0.17 for cuda 12.1

image

Maelic avatar Apr 04 '24 13:04 Maelic

I am using pytorch=1.13 and torchvision=0.14. I can import torchvision.datasets as you did, but when I run the scripts to conduct sgdet on custom images, the error came up. it is confused.

draym28 avatar Apr 04 '24 13:04 draym28

Then you may be running your code in another conda env or something like that. You can also try to clean and re-build the package with something like rm -rf ./build/ && python setup.py build develop

Maelic avatar Apr 04 '24 13:04 Maelic

I clean and create a new env many times. But the error still come up. And I also did python setup.py build develop every time. Many people also have this problem, see this.

draym28 avatar Apr 04 '24 13:04 draym28

Can you post the outputs of pip freeze | grep torchvision and conda list | grep torchvision ? You may have different versions of torchvision installed at the same time.

Maelic avatar Apr 04 '24 13:04 Maelic

outputs of pip freeze | grep torchvision: torchvision==0.14.1 outputs of conda list | grep torchvision: torchvision 0.14.1 py38_cu117 pytorch

draym28 avatar Apr 04 '24 13:04 draym28

Hum I don't know, from your outputs I assume that you installed torchvision with conda, try removing it and install with pip maybe. On my machine, I installed it with the following command (for cuda 12.1): pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

Maelic avatar Apr 04 '24 13:04 Maelic

Still don't work. This time I create a new env and use pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 --index-url https://download.pytorch.org/whl/cu117. But the error still come up.

draym28 avatar Apr 04 '24 15:04 draym28

I'm afraid I can't help you more here, sorry. I don't recall having this error ever, even when I was working with previous versions of pytorch for this codebase.

Maelic avatar Apr 04 '24 15:04 Maelic

It is OK, thanks for your help. I will keep finding the solution.

draym28 avatar Apr 05 '24 00:04 draym28

Hi @Maelic, thank you for sharing your implementation. I'm encountering an issue with installing Apex due to CUDA compatibility. I was wondering if you could provide guidance on how to resolve this. Thanks!

Ali-Hatami avatar Apr 15 '24 17:04 Ali-Hatami

Hi @Maelic, thank you for sharing your implementation. I'm encountering an issue with installing Apex due to CUDA compatibility. I was wondering if you could provide guidance on how to resolve this. Thanks!

You don't need to use APEX anymore as it is depreciated and built-in for new versions of torch. Please consider removing all reference to apex and this line https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/4b6b71a90d4198d9dae574d42b062a5e534da291/tools/relation_train_net.py#L159

And add this a little above:

with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=use_amp):
            loss_dict = model(images, targets)
            
            losses = sum(loss for loss in loss_dict.values())

And it should work, see:

https://github.com/Maelic/SGG-Benchmark/blob/cecf1bbe46f3d862704d9cf0ffccf2282fb00cfe/tools/relation_train_net.py#L51

Maelic avatar Apr 15 '24 17:04 Maelic

Thank you for the prompt response. In the step-by-step installation (https://github.com/Maelic/SGG-Benchmark/blob/main/INSTALL.md) I have an error. My CUDA version is 11.5 but 11.5 is not available in the nvidia channels. How can I solve this issue?

RuntimeError: The detected CUDA version (11.5) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.

Ali-Hatami avatar Apr 15 '24 18:04 Ali-Hatami

Thank you for the prompt response. In the step-by-step installation (https://github.com/Maelic/SGG-Benchmark/blob/main/INSTALL.md) I have an error. My CUDA version is 11.5 but 11.5 is not available in the nvidia channels. How can I solve this issue?

RuntimeError: The detected CUDA version (11.5) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.

Try upgrading your CUDA version or build torch from source. By the way, this is not an issue directly related to this work, you will probably have more success if you ask on the dedicated PyTorch forum.

Maelic avatar Apr 16 '24 08:04 Maelic