Grid-Anchor-based-Image-Cropping-Pytorch roi_align_api.forward outputs are all zeros

roi_align_api.forward outputs are all zeros

Open yanivge1 opened this issue 4 years ago • 5 comments

i've Built and installed source code of roi_align_api and rod_align_api. but while running roi_align.py -> roi_align_api.forward(), outputs are all zeros. all params are default, i didn't change anything. aligned_height : 10 aligned_width: 10 spatial_scale: 0.0625 features.shape: torch.Size([1, 8, 16, 24]) features.dtype: torch.float32 rois.shape: torch.Size([83, 5]) rois.dtype: torch.float32 rois tensor([[ 0.0000, 16.0000, 10.6667, 304.0000, 181.3333], [ 0.0000, 16.0000, 10.6667, 336.0000, 181.3333], [ 0.0000, 16.0000, 10.6667, 272.0000, 202.6667], ...]

features: ..., [ 0.7069, 0.1061, 0.2352, ..., 0.2554, 0.2490, -0.6246], [-0.3563, 0.0466, -0.4040, ..., 0.3803, -0.1317, -1.2327], [-1.2773, -0.5382, -0.9917, ..., -0.6372, -0.9664, -2.5898]]]], grad_fn=MkldnnConvolutionBackward)

May 24 '20 08:05 yanivge1

It turns out this issue appears in the CPU mode, while we did not implement the C++ code.

May 24 '20 12:05 HuiZeng

@HuiZeng

When run demo_eval.py, is nn.Dataparallel necessary? I commented out some code as fellows:

 if args.cuda:
            # net = torch.nn.DataParallel(net, device_ids=[0])
            cudnn.benchmark = True
            net = net.cuda()

Then the error message generated (at: net(img, rois)):

RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCReduceAll.cuh:327

Do you know where the problem is?

I am currently modifying your method to generate rois of any specified aspect ratio, and I want to finally add the cpu version of roi/rod.

Thanks a lot

Jul 16 '20 08:07 lih627

You can try this version if you want to use the CPU mode. https://github.com/HuiZeng/Grid-Anchor-based-Image-Cropping-Pytorch

Jul 16 '20 08:07 HuiZeng

@HuiZeng

When run demo_eval.py, is nn.Dataparallel necessary? I commented out some code as fellows:
 if args.cuda:
            # net = torch.nn.DataParallel(net, device_ids=[0])
            cudnn.benchmark = True
            net = net.cuda()
Then the error message generated (at: net(img, rois)):
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCReduceAll.cuh:327
Do you know where the problem is?

I am currently modifying your method to generate rois of any specified aspect ratio, and I want to finally add the cpu version of roi/rod.

Thanks a lot

Hello, how many GPU cards are available? More than 1? If yes, could you please only use 1 GPU card first? I'm not pretty sure if the problem is caused by accessing memory allocated at different device in our .cu code. Try "export CUDA_VISIBLE_DEVICES=0" before you run the code. I'm sure nn.DataParallel or nn.DistribubutedDataParallel is NOT required in our current implementation.

For CPU implementation, it could be possible to change the setup code to compile C++ implementation. The C++ code is only for reference. We don't take a thorough test of the C++ implementation as it definitely takes lots of time for training:(

Jul 16 '20 08:07 lld533

@HuiZeng @lld533 Helllo, I just use CPU for inference:), because the pre-trained model is perfect. I am designing an intelligent cropping project based on your model and method. The purpose is to generate a specified proportion of cropping results according to the needs of the editor. So I changed the roi generation method and added face detection.

Currently, I want to encapsulate it as a module, for both CPU and GPU users.

I'm confused about nn.DataParallel. In fact, it works well with nn.DataParallel in my PC(1 GPU card) both in CPU/GPU. But when comment that line, the error occurs. This error is very confusing.... I am a beginner of cuda and cannot locate the cause of the error.


def test():
    for epoch in range(0, 1):

        net = build_crop_model(scale='multi',  # scale='single',
                               alignsize=9, reddim=8, loadweight=False, model='mobilenetv2', downsample=4)
        net.load_state_dict(torch.load(args.net_path))
        net.eval()

        if args.cuda:
            # I comment that line
            #  net = torch.nn.DataParallel(net, device_ids=[0])
            cudnn.benchmark = True 
            net = net.cuda()

        data_loader = data.DataLoader(dataset, args.batch_size,
                                      num_workers=args.num_workers,
                                      collate_fn=naive_collate,
                                      shuffle=False)

        for id, sample in enumerate(data_loader):
            imgpath = sample['imgpath']
            image = sample['image']
            bboxes = sample['sourceboxes']
            resized_image = sample['resized_image']
            tbboxes = sample['tbboxes']

            if len(tbboxes['xmin']) == 0:
                continue

            roi = []

Can I contact you by mail? My E-mail: [email protected]

Thanks.

Jul 16 '20 09:07 lih627

Grid-Anchor-based-Image-Cropping-Pytorch Grid-Anchor-based-Image-Cropping-Pytorch copied to clipboard

roi_align_api.forward outputs are all zeros

Grid-Anchor-based-Image-Cropping-Pytorch
Grid-Anchor-based-Image-Cropping-Pytorch copied to clipboard