CascadeTabNet icon indicating copy to clipboard operation
CascadeTabNet copied to clipboard

remove GPU dependency for interference?

Open luke4u opened this issue 4 years ago • 22 comments

Hi Guys,

First of all, thank you so much for sharing this amazing work. I have run the demo colab and got a good result.

To confirm, to run interference, cuda-enabled GPU is a must?

As #34 mentioned, do you consider to ease the dependency on GPU? This could make the model more scalable.

Thanks again. Luke

luke4u avatar Aug 19 '20 20:08 luke4u

+1 for this. Ensuring a GPU is available in a production environment in the cloud can be a real nuisance. Also since MMdetection 2.0 there is support for CPU-only mode. So if someone is able to reproduce or convert the model to mmdetection 2.0-compatible format, then this model can be used for inference in a CPU-only environment. The nice part is that training can still be done with GPU, but the resulting checkpoints will be able to load and run in a CPU-only environment too.

See also this page on cpu-only mode and this page on upgrading from 1.x to 2.0. Unfortunately I wasn't able to succesfully convert the model myself using the provided conversion tool. Hopefully the creator could help out and provide trained models compatible with mmdetection 2.0.

iiLaurens avatar Aug 25 '20 08:08 iiLaurens

Since the creator of issue #77 mentioned he was able to convert the model (but unfortunately did not share his config or conversion steps), I decided to give it another shot myself. Succesfully this time.

I would like to refer you all to my branch at iiLaurens/CascadeTabNet:mmdet2x. It includes a demo notebook on how to run using mmdetection v2.3.0 in a cpu only colab environment. You can find that notebook here. All checkpoint files can be found under the releases on this page. Happy inferencing!

iiLaurens avatar Sep 09 '20 10:09 iiLaurens

Hi @iiLaurens , thank you for sharing the workflow! Noticed you are using mmcv-full==1.0.5 There seems no distribution available for the Windows platform in below link, and mmcv-full relies on CUDA? (correct me if I am wrong).

https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html

I had to install mmcv=1.0.5, but ran into an error ModuleNotFoundError: No module named 'mmcv._ext' Btw, do you manage to run the model on a Windows platform with only CPU?

luke4u avatar Sep 25 '20 10:09 luke4u

As far as I know there is no windows version for mmcv-full. And as you noticed mmcv simply doesn't work at all. I run in Linux environment.

On Fri, Sep 25, 2020, 12:33 Luke [email protected] wrote:

Hi @iiLaurens https://github.com/iiLaurens , thank you for sharing the workflow! Noticed you are using mmcv-full==1.0.5 There seems no distribution available for the Windows platform in below link, and mmcv-full relies on CUDA? (correct me if I am wrong).

https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html

I had to install mmcv=1.0.5, but ran into an error ModuleNotFoundError: No module named 'mmcv._ext' Btw, do you manage to run the model on a Windows platform with only CPU?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DevashishPrasad/CascadeTabNet/issues/71#issuecomment-698855268, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLUZ5I2CILVSA5G6BEZIBTSHRWUHANCNFSM4QFKPUUA .

iiLaurens avatar Sep 25 '20 15:09 iiLaurens

hi @iiLaurens you only converted the models or after converting you have trained for some epochs? i am able to convert the model but its output is not perfect as your model.

kbrajwani avatar Nov 02 '20 07:11 kbrajwani

I did not do any further training, just converting. If my memory serves me correctly, I had to convert both the model and the config file. Did you convert both?

On Mon, Nov 2, 2020, 08:32 Kumar Rajwani [email protected] wrote:

hi @iiLaurens https://github.com/iiLaurens you only converted the models or after converting you have trained for some epochs? i am able to convert the model but its output is not perfect as your model.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DevashishPrasad/CascadeTabNet/issues/71#issuecomment-720292477, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLUZ5OQFT7VPVVCITJHUJLSNZN65ANCNFSM4QFKPUUA .

iiLaurens avatar Nov 03 '20 21:11 iiLaurens

No, i have only converted model and i am using config file of mmdetection version 2 which is compatible with the model.

kbrajwani avatar Nov 04 '20 02:11 kbrajwani

How to convert the model and the config file to mmdetection version 2 from version 1?

19debanjanbanerjee98 avatar Nov 04 '20 16:11 19debanjanbanerjee98

i have done something like this

import torch
checkpoint = torch.load("/content/epoch_36.pth")

## remove the path which giving error while conversion
checkpoint['meta']['config'] = checkpoint['meta']['config'].replace("/content/drive/My Drive/chunk cascade_mask_rcnn_hrnetv2p_w32_20e.py\n","")

torch.save(checkpoint, "/content/epoch_35.pth")

##convert
!python mmdetection/tools/upgrade_model_version.py /content/epoch_35.pth /content/epoch_37.pth --num-classes 81

##detection
from mmdet.apis import init_detector, inference_detector, show_result_pyplot
import mmcv
# Load model
config_file = '/content/mmdetection/configs/hrnet/cascade_mask_rcnn_hrnetv2p_w32_20e_coco.py'
checkpoint_file = '/content/epoch_37.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# Test a single image 
img = "/content/5.29.2020 COI - Corvias Construction Partners, LLC_0001.jpg"

# Run Inference
result = inference_detector(model, img)

# Visualization results
show_result_pyplot(model, img, result, score_thr=0.85)

kbrajwani avatar Nov 09 '20 06:11 kbrajwani

Since the creator of issue #77 mentioned he was able to convert the model (but unfortunately did not share his config or conversion steps), I decided to give it another shot myself. Succesfully this time.

I would like to refer you all to my branch at iiLaurens/CascadeTabNet:mmdet2x. It includes a demo notebook on how to run using mmdetection v2.3.0 in a cpu only colab environment. You can find that notebook here. All checkpoint files can be found under the releases on this page. Happy inferencing!

@iiLaurens thanks for this effort. Does that also mean I can use CasCadetabnet architecture with my already installed mmdetection v2.3 even when the network was trained on mmdetection v1.2?

ashish-kubade avatar Dec 18 '20 11:12 ashish-kubade

@iiLaurens Thank you soo much for your work. only thing I changed to work on my cpu is run this !pip install mmcv-full==1.0.5 -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.5.0/index.html

instead of this !pip install mmcv-full==1.0.5+torch1.5.0+cpu -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html

Trinadhbabu avatar Feb 12 '21 15:02 Trinadhbabu

I fine-tune/trained a model and was able to upgrade using mmdetection/tools/upgrade_model_version.py , and use @iiLaurens config and run both init_detector, and inference_detector using following package setup.

mmcv-full==1.0.5
mmdet==2.3.0
numpy==1.21.3
opencv-python==4.5.4.58
pycocotools==2.0.2
torch==1.5.1+cpu
torchvision==0.6.1+cpu

However in my CPU version from my checkpoint for inference I get back all empty arrays for the 81 classes. Only difference is that I started General Model table detection link check point and I trained with original config.

If anyone has some ideas of what to try or change would greatly appreciate it.

UPDATE: In case it helps anyone who is also fine-tuning their model, I can't upgrade a model I fine tuned in mmdet 1.2 upgrade it and train w/ mmdet > 2 or infer on CPU from it. I was able to upgrade their checkpoint and train and infer on CPU (I used General Model table detection epoch_24.pth) If it is possible please let me know.

hurshprasad avatar Nov 05 '21 03:11 hurshprasad

@iiLaurens , Thank you for your work. Is this possible to run your notebook or reproduce your result on a local windows environment? I tried and failed to install the requirements, and it was similar to @luke4u. If it is not possible to use reproduce on windows, could you share the Linux environment details, or suggest the necessary packages to build a docker file for it.

Thank you for your time, also thanks in advance if anyone could help out with some ideas.

anhhaibkhn avatar Nov 08 '21 08:11 anhhaibkhn

I was able to get it to run from docker container (for use in AWS Lambda). This is the dockerfile:

FROM public.ecr.aws/lambda/python:3.8

RUN yum -y install gcc mesa-libGL

RUN pip install \
  torch==1.6.0+cpu \
  torchvision==0.7.0+cpu \
  -f https://download.pytorch.org/whl/torch_stable.html \
  && rm -rf /root/.cache/pip

RUN pip install \
  mmdet==2.3.0 \
  pycocotools==2.0.2 \
  requests

RUN pip install mmcv-full==1.0.5 -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.6.0/index.html

And you need the converted checkpoint and config files that you can find in my repo.

Then some code like this should make it work:

from mmdet.apis import inference_detector, init_detector

config = '/pdfextract/cascadeTabNet/cascade_mask_rcnn_hrnetv2p_w32_20e.py'
checkpoint = '/pdfextract/cascadeTabNet/General.Model.table.detection.v2.pth'

model = init_detector(config, checkpoint, device='cpu')
results = inference_detector(model, img)

iiLaurens avatar Nov 08 '21 11:11 iiLaurens

Thanks so much for your suggestions. I will give it a try to build a similar ubuntu container for running it on the local windows.

anhhaibkhn avatar Nov 09 '21 01:11 anhhaibkhn

@iiLaurens ,

Thanks a lot again. I just want to let you know that I was able to build a running ubuntu container on windows thanks to your suggestion.

Now, I could get the inference results without any problems on windows with just the CPU. Awesome work!

anhhaibkhn avatar Nov 10 '21 08:11 anhhaibkhn

@iiLaurens ,

Thanks a lot again. I just want to let you know that I was able to build a running ubuntu container on windows thanks to your suggestion.

Now, I could get the inference results without any problems on windows with just the CPU. Awesome work!

can you please elaborate your steps

mohit-217 avatar Dec 20 '21 14:12 mohit-217

Hi Folks I admire the work of @iiLaurens and appreciate the team. However I'm find error as below. Please I request any of you to resolve this issues it would be highly be appreciated. As I'm using colab notebook with cpu ERROR: Could not find a version that satisfies the requirement torch==1.5.1+cpu (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2, 2.0.1, 2.0.1+cpu, 2.0.1+cpu.cxx11.abi, 2.0.1+cu117, 2.0.1+cu117.with.pypi.cudnn, 2.0.1+cu118, 2.0.1+rocm5.3, 2.0.1+rocm5.4.2) ERROR: No matching distribution found for torch==1.5.1+cpu

AGRocky avatar Jul 04 '23 06:07 AGRocky

Hey Abhishek this is related to more your dependencies

On Tue, 4 Jul, 2023, 11:47 Abhishek G, @.***> wrote:

Hi Folks I admire the work of @iiLaurens https://github.com/iiLaurens and appreciate the team. However I'm find error as below. Please I request any of you to resolve this issues it would be highly be appreciated. As I'm using colab notebook with cpu ERROR: Could not find a version that satisfies the requirement torch==1.5.1+cpu (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2, 2.0.1, 2.0.1+cpu, 2.0.1+cpu.cxx11.abi, 2.0.1+cu117, 2.0.1+cu117.with.pypi.cudnn, 2.0.1+cu118, 2.0.1+rocm5.3, 2.0.1+rocm5.4.2) ERROR: No matching distribution found for torch==1.5.1+cpu

— Reply to this email directly, view it on GitHub https://github.com/DevashishPrasad/CascadeTabNet/issues/71#issuecomment-1619566744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMJEFL6UHR7RW5ASAA4TZ6LXOOYPDANCNFSM4QFKPUUA . You are receiving this because you commented.Message ID: @.***>

mohit-217 avatar Jul 04 '23 06:07 mohit-217

Could you please elaborate my friend

AGRocky avatar Jul 04 '23 07:07 AGRocky

Please elaborate more

linkstatic12 avatar Jul 08 '23 12:07 linkstatic12