GenRe-ShapeHD icon indicating copy to clipboard operation
GenRe-ShapeHD copied to clipboard

error in projection foward: no kernel image is available for execution on the device

Open royalhao3zZ opened this issue 6 years ago • 27 comments

I've been stopped by this issue for several days. while running test_genre.sh,I got the following error: Traceback (most recent call last): File "test.py", line 95, in model.test_on_batch(i, batch) File "/home/zhanghao/models/genre_full_model.py", line 182, in test_on_batch pred = self.forward_with_trimesh(batch) File "/home/zhanghao/models/genre_full_model.py", line 207, in forward_with_trimesh proj = self.net.depth_and_inpaint.proj_depth(pred_abs_depth) File "/media/zhanghao/娱乐/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/zhanghao/toolbox/cam_bp/cam_bp/modules/camera_backprojection_module.py", line 22, in forward df = CameraBackProjection.apply(depth_t, fl, cam_dist, self.res) File "/home/zhanghao/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt) File "/media/zhanghao/娱乐/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 202, in safe_call result = torch._C._safe_call(*args, **kwargs) torch.FatalError: aborting at /data/vision/billf/scratch/ztzhang/shape_oneshot/ShapeRecon/toolbox/cam_bp/cam_bp/src/back_projection.c:14

Does anyone have solution for that? thanks.

royalhao3zZ avatar Apr 26 '19 09:04 royalhao3zZ

Thank you for making the code available, Xiuming.

I've met the same error in trying to train marrnet with shapenet examples. Is there a solution here?

Hao, did you ever figure this out?

Thanks again, Jeff

weeoooweeooo avatar Jun 26 '19 03:06 weeoooweeooo

@weeoooweeooo would you mind sharing your detailed error message? It seems that I can not reproduce this. I suspect this might be caused by improper install of cuda kernels; I'll update an install script for this.

ztzhang avatar Jun 26 '19 04:06 ztzhang

@ztzhang Thank you for responding so quickly. I'm in the process of trying to install new kernels exactly.

==> Training Epoch 1/1000 10000/10000 [==============================] - 188s - loss: 1549.6328 - depth: 614.3428 - silhou: 483.5301 - normal: 451.7600 - depth_minmax: 2138.4353
Eval 1/1000 error in projection foward: no kernel image is available for execution on the device Traceback (most recent call last): File "train.py", line 216, in eval_at_start=opt.eval_at_start File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 287, in train_epoch _eval(epoch) File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 270, in _eval batch_log = self._vali_on_batch(epoch, i, data) File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 69, in _vali_on_batch output = self.pack_output(pred, batch) File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 94, in pack_output out['proj_depth'] = self.proj_depth(pred_abs_depth).cpu().numpy() File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 154, in proj_depth proj_depth = self.cam_bp(abs_depth) File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 174, in forward return CameraBackProjection.apply(depth_t, fl, cam_dist, self.res) File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt) File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/_ext/cam_bp_lib/init.py", line 175, in safe_call result = torch._C._safe_call(*args, **kwargs) torch.FatalError: aborting at /data/vision/billf/scratch/ztzhang/shape_oneshot/ShapeRecon/toolbox/cam_bp/cam_bp/src/back_projection.c:14

The issues arose originally in trying to create a workaround due to deprecation of torch.utils.ffi in pytorch 1.0, however. I'm using an RTX gpu which requires it and CUDA 10, but don't understand _wrap_function nor create_extension well enough to rewrite those sections. The original errors follow. The solution isn't a drop in replacement, it seems. Do you have any ideas?

==> Parsing arguments Traceback (most recent call last): File "train.py", line 18, in opt, unique_opt_params = options_train.parse() File "/srv/git/GenRe-ShapeHD/options/options_train.py", line 118, in parse parser, unique_params_model = get_model(net_name).add_arguments(parser) File "/srv/git/GenRe-ShapeHD/models/init.py", line 5, in get_model module = importlib.import_module('models.' + alias) File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 8, in from .marrnetbase import MarrnetBaseModel File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 7, in from toolbox.cam_bp.cam_bp.functions import CameraBackProjection File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/init.py", line 1, in from .cam_back_projection import CameraBackProjection File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 4, in from .._ext import cam_bp_lib File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/_ext/cam_bp_lib/init.py", line 1, in from torch.utils.ffi import _wrap_function File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 1, in raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.") ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

weeoooweeooo avatar Jun 26 '19 05:06 weeoooweeooo

@weeoooweeooo I think your problem is different from OP. @royalhao3zZ I just pushed a fix with clean_toolbox_build.sh and build_toolbox.sh. Would you mind trying clean the previous build first and rebuild the toolbox again? Thanks!

As for @weeoooweeooo, I think I might have a quick fix for that in hand, please stay tuned.

ztzhang avatar Jun 26 '19 05:06 ztzhang

@weeoooweeooo I figured out a quick fix to make it compile. However we would like to keep the original repo consistent so I do not plan to push this to the repo.

Here's what I did:

  1. copy all .c files as .cpp files.
  2. for each setup.sh, comment out line 34-42.
  3. modify the build.py as follows (only showing for calc_prob) :
import os
import sys
import torch
from torch.utils.cpp_extension import CppExtension, BuildExtension, include_paths

this_file = os.path.dirname(os.path.realpath(__file__))
print(this_file)

extra_compile_args = list()


extra_objects = list()
assert(torch.cuda.is_available())
sources = ['calc_prob/src/calc_prob.cpp']
headers = ['calc_prob/src/calc_prob.h']
defines = [('WITH_CUDA', True)]
with_cuda = True

extra_objects = ['calc_prob/src/calc_prob_kernel.cu.o']
extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]

ffi_params = {
    #'headers': headers,
    'sources': sources,
    'define_macros': defines,
    #'relative_to': __file__,
    #'with_cuda': with_cuda,
    'extra_objects': extra_objects,
    'include_dirs': [os.path.join(this_file, 'calc_prob/src')] + include_paths(True),
    'extra_compile_args': extra_compile_args,
}


if __name__ == '__main__':
    ext = CppExtension(
        'calc_prob._ext.calc_prob_lib',
        # package=False,
        **ffi_params)

    from setuptools import setup
    setup(name='calc_prob', ext_modules=[ext], cmdclass={'build_ext': BuildExtension})

Then you could first run setup.sh to build the .so files and run python build.py build_ext to build the extensions you need. Then you might need to copy or soft link the built _ext in the build folder to(there might be parent folders with your os and python spec), to calc_prob/calc_prob/_ext

ztzhang avatar Jun 26 '19 06:06 ztzhang

@weeoooweeooo would you mind letting us know if this works for you? Thanks!

ztzhang avatar Jun 26 '19 18:06 ztzhang

@ztzhang Thank you so much for helping with me with this specific workaround. I have tried your suggestions, but am now being met with this error calling functions from the newly built extension:

==> Training Epoch 1/1000 10000/10000 [==============================] - 191s - loss: 1574.3203 - depth: 618.3826 - silhou: 500.1726 - normal: 455.7651 - depth_minmax: 1982.9009
Eval 1/1000 Traceback (most recent call last): File "train.py", line 213, in eval_at_start=opt.eval_at_start File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 285, in train_epoch _eval(epoch) File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 268, in _eval batch_log = self._vali_on_batch(epoch, i, data) File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 69, in _vali_on_batch output = self.pack_output(pred, batch) File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 94, in pack_output out['proj_depth'] = self.proj_depth(pred_abs_depth).cpu().numpy() File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 154, in proj_depth proj_depth = self.cam_bp(abs_depth) File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 174, in forward return CameraBackProjection.apply(depth_t, fl, cam_dist, self.res) File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt) AttributeError: module 'toolbox.cam_bp.cam_bp._ext.cam_bp_lib' has no attribute 'back_projection_forward'

Have tried to troubleshoot a bit. Everything appears smooth, except a warning in compiling:

/srv/git/GenRe-ShapeHD/toolbox/cam_bp running build_ext building 'cam_bp._ext.cam_bp_lib' extension creating build creating build/temp.linux-x86_64-3.6 creating build/temp.linux-x86_64-3.6/cam_bp creating build/temp.linux-x86_64-3.6/cam_bp/src gcc -pthread -B /home/gsq/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/TH -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/TH -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/THC -I/home/gsq/anaconda3/envs/shaperecon/include/python3.6m -c cam_bp/src/back_projection.cpp -o build/temp.linux-x86_64-3.6/cam_bp/src/back_projection.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cam_bp_lib -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ creating build/lib.linux-x86_64-3.6 creating build/lib.linux-x86_64-3.6/cam_bp creating build/lib.linux-x86_64-3.6/cam_bp/_ext g++ -pthread -shared -B /home/gsq/anaconda3/envs/shaperecon/compiler_compat -L/home/gsq/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/gsq/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/cam_bp/src/back_projection.o /srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection_kernel.cu.o -o build/lib.linux-x86_64-3.6/cam_bp/_ext/cam_bp_lib.cpython-36m-x86_64-linux-gnu.so

weeoooweeooo avatar Jun 27 '19 04:06 weeoooweeooo

Hi, after a more careful read into the doc, it seems the build system now relies on pybind11 to expose the cpp functions calls; I'm guessing this is why the error happens. I don't think we need to rewrite everything since the C API is still maintained, but to add pybind to the cpp functions. Sorry I may not have much capacity to fix this issue particularly, but I would suggest adding pybind to the cpp files and see if it works.

ztzhang avatar Jun 27 '19 16:06 ztzhang

Hi @ztzhang, thank you for your guidance about pybind11 to expose the cpp functions. I have been trying to do so. Here, I have modified their example with your build.py script:

from setuptools import setup, Extension from setuptools.command.build_ext import build_ext import sys import setuptools import os import torch from torch.utils.cpp_extension import CppExtension, BuildExtension, include_paths

version = '0.0.1' this_file = os.path.dirname(os.path.realpath(file)) print(this_file)

class get_pybind_include(object): """Helper class to determine the pybind11 include path The purpose of this class is to postpone importing pybind11 until it is actually installed, so that the get_include() method can be invoked. """

def __init__(self, user=False):
    self.user = user

def __str__(self):
    import pybind11
    return pybind11.get_include(self.user)

extra_compile_args =list() # ['python3 -m pybind11 --includes'] extra_objects = list() assert(torch.cuda.is_available()) sources = ['cam_bp/src/back_projection.cpp'] headers = ['cam_bp/src/back_projection.h'] defines = [('WITH_CUDA', True)] with_cuda = True extra_objects = ['cam_bp/src/back_projection_kernel.cu.o'] extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]

ffi_params = { # 'headers': headers, # 'sources': sources, 'define_macros': defines, # 'relative_to': file, # 'with_cuda': with_cuda, 'extra_objects': extra_objects, 'extra_compile_args': extra_compile_args, }

ext_modules = [ CppExtension( 'cam_bp_lib', ['cam_bp/src/back_projection.cpp'], include_dirs=[ os.path.join(this_file, 'cam_bp/src'), # Path to pybind11 headers get_pybind_include(), get_pybind_include(user=True), '/usr/local/cuda-10.0/targets/x86_64-linux/include'], language='c++', **ffi_params ), ]

As of Python 3.6, CCompiler has a has_flag method.

cf http://bugs.python.org/issue26689

def has_flag(compiler, flagname): """Return a boolean indicating whether a flag name is supported on the specified compiler. """ import tempfile with tempfile.NamedTemporaryFile('w', suffix='.cpp') as f: f.write('int main (int argc, char **argv) { return 0; }') try: compiler.compile([f.name], extra_postargs=[flagname]) except setuptools.distutils.errors.CompileError: return False return True

def cpp_flag(compiler): """Return the -std=c++[11/14/17] compiler flag The newer version is prefered over c++11 (when it is available). """ flags = ['-std=c++17', '-std=c++14', '-std=c++11']

for flag in flags:
    if has_flag(compiler, flag): return flag

raise RuntimeError('Unsupported compiler -- at least C++11 support '
                   'is needed!')

class BuildExt(build_ext): """A custom build extension for adding compiler-specific options.""" c_opts = { 'msvc': ['/EHsc'], 'unix': [], } l_opts = { 'msvc': [], 'unix': [], }

if sys.platform == 'darwin':
    darwin_opts = ['-stdlib=libc++', '-mmacosx-version-min=10.7']
    c_opts['unix'] += darwin_opts
    l_opts['unix'] += darwin_opts

def build_extensions(self):
    ct = self.compiler.compiler_type
    opts = self.c_opts.get(ct, [])
    link_opts = self.l_opts.get(ct, [])
    if ct == 'unix':
        opts.append('-DVERSION_INFO="%s"' % self.distribution.get_version())
        opts.append(cpp_flag(self.compiler))
        if has_flag(self.compiler, '-fvisibility=hidden'):
            opts.append('-fvisibility=hidden')
    elif ct == 'msvc':
        opts.append('/DVERSION_INFO=\\"%s\\"' % self.distribution.get_version())
    for ext in self.extensions:
        ext.extra_compile_args = opts
        ext.extra_link_args = link_opts
    build_ext.build_extensions(self)

setup( name='cam_bp_lib', version=version, ext_modules=ext_modules, install_requires=['pybind11>=2.3'], setup_requires=['pybind11>=2.3'], cmdclass={'build_ext': BuildExtension}, zip_safe=False, )

Though I am able to expose simpler functions with this setup, I'm unable to get it working for your toolboxes so far unfortunately. Currently, I'm getting this error in trying to import the compiled toolbox:

import cam_bp_lib Traceback (most recent call last): File "", line 1, in ImportError: /home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/cam_bp_lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZTIN3c1010TensorImplE

I suspect the main issue has to do with sharing the CUDA library .so to compile with the .cpp file. Do you have any insight about this maybe? Should I try compiling the CUDA code with cuda_extension? Or maybe share the library in this manner https://devtalk.nvidia.com/default/topic/759162/shared-library-separate-compilation-c-c-/ ?

weeoooweeooo avatar Jul 05 '19 07:07 weeoooweeooo

@weeoooweeooo I might have some time to look at this ~~during the weekend~~ next week, my guess is that we also need to add extern_c for the wrapper functions.

I'm not sure if it is still cost efficient to hack it tho; I'll try to overhaul some of those kernels to the current c++ api, and I think some of the ops are already included in the pytorch function sets.

ztzhang avatar Jul 05 '19 16:07 ztzhang

That would be very helpful @ztzhang. I'd appreciate any input you might have. Thanks! I'm afraid I've not had much exposure to c++/cuda, but am very interested in trying to use your model with some medical images. Please let me know what I can do.

weeoooweeooo avatar Jul 09 '19 01:07 weeoooweeooo

@weeoooweeooo I think your problem is different from OP. @royalhao3zZ I just pushed a fix with clean_toolbox_build.sh and build_toolbox.sh. Would you mind trying clean the previous build first and rebuild the toolbox again? Thanks!

As for @weeoooweeooo, I think I might have a quick fix for that in hand, please stay tuned.

I am having the same issue as @royalhao3zZ. I ran ./clean_toolbox_build.sh and then ./build_toolbox.sh again, but I'm still getting the same issue when trying to run scripts/test_genre.sh. If you could provide any insight into this error, or any potential fixes, I would really appreciate it! Thank you!

dannygelman1 avatar Oct 14 '19 21:10 dannygelman1

@dannygelman1 would you mind sharing your compile time messages as well as the error messages?

ztzhang avatar Oct 14 '19 22:10 ztzhang

Yes! Thank you for looking into this! This is everything that prints after I run scripts/test_genre.sh 0 (The zero is to indicate the index of the gpu I want to use. Since my machine only has one gpu, it is at index 0)

==> Parsing arguments
Namespace(adam_beta1=0.5, adam_beta2=0.9, batch_size=1, classes='chair', dataset=None, epoch=0, epoch_batches=None, eval_at_start=False, eval_batches=None, expr_id=0, full_logdir=None, gpu='0', inpaint_path=None, input_mask='./downloads/data/test/genre/*_silhouette.*', input_rgb='./downloads/data/test/genre/*_rgb.*', joint_train=False, load_offline=False, log_batch=False, log_time=False, logdir=None, lr=0.0001, manual_seed=None, net='genre_full_model', net1_path=None, net_file='./downloads/models/full_model.pt', optim='adam', output_dir='./output/test', overwrite=True, padding_margin=16, pred_depth_minmax=True, resume=0, save_net=1, save_net_opt=False, sgd_dampening=0, sgd_momentum=0.9, suffix='{net}', surface_weight=1.0, tensorboard=False, vis_batches_train=10, vis_batches_vali=10, vis_every_train=1, vis_every_vali=1, vis_param_f=None, vis_workers=4, wdecay=0.0, workers=0)
==> Setting device
[Warning] Designated GPU in use: id=0, util=11%, memory in use: 450 MiB
==> Setting up output directory
==> Setting up loggers
==> Setting up models
[Warning] Model loaded without optimizer states. 
Testing GenRe
# model parameters: 100,204,619
==> Setting up data loaders
[Verbose] Time spent in data IO initialization: 0.00s
[Verbose] # test points: 4
[Verbose] # test batches: 4
==> Testing
  0%|                                                                                                                                                                                                                | 0/4 [00:00<?, ?it/s]error in projection foward: no kernel image is available for execution on the device

Traceback (most recent call last):
  File "test.py", line 94, in <module>
    model.test_on_batch(i, batch)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/models/genre_full_model.py", line 182, in test_on_batch
    pred = self.forward_with_trimesh(batch)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/models/genre_full_model.py", line 207, in forward_with_trimesh
    proj = self.net.depth_and_inpaint.proj_depth(pred_abs_depth)
  File "/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/modules/camera_backprojection_module.py", line 22, in forward
    df = CameraBackProjection.apply(depth_t, fl, cam_dist, self.res)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward
    cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt)
  File "/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 202, in safe_call
    result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: aborting at /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.c:14```

dannygelman1 avatar Oct 15 '19 00:10 dannygelman1

Would you mind cleaning the build and recompile the cuda kernels? And please post the corresponding print so that I can help tracking this down. Thanks.

ztzhang avatar Oct 15 '19 13:10 ztzhang

After I run ./clean_toolbox_build.sh I get the following

Directory calc_prob/__pycache__ removed
Directory calc_prob/_ext removed
Directory calc_prob/functions/__pycache__ removed
File cam_bp/src/back_projection_kernel.cu.o removed
__pycache__ not found
dist not found
build not found
pytorch_camera_back_projection.egg-info not found
.cache not found
Directory cam_bp/__pycache__ removed
Directory cam_bp/_ext removed
Directory cam_bp/functions/__pycache__ removed
Directory cam_bp/modules/__pycache__ removed

Since it is saying build not found, among other files, does that mean I am not creating all the necessary files?

dannygelman1 avatar Oct 15 '19 16:10 dannygelman1

Yes. Can you post the compile messages as well?

On Tue, Oct 15, 2019 at 12:18 PM Danny Gelman [email protected] wrote:

After I run ./clean_toolbox_build.sh I get the following

Directory calc_prob/pycache removed Directory calc_prob/_ext removed Directory calc_prob/functions/pycache removed File cam_bp/src/back_projection_kernel.cu.o removed pycache not found dist not found build not found pytorch_camera_back_projection.egg-info not found .cache not found Directory cam_bp/pycache removed Directory cam_bp/_ext removed Directory cam_bp/functions/pycache removed Directory cam_bp/modules/pycache removed

Since it is saying build not found, among other files, does that mean I am not creating all the necessary files?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xiumingzhang/GenRe-ShapeHD/issues/16?email_source=notifications&email_token=ADF4WO6BKBSDFSBNAK2WIMLQOXUOLA5CNFSM4HIT2QL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJL2TA#issuecomment-542293324, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADF4WO35UGCKTRQTLQKRCNTQOXUOLANCNFSM4HIT2QLQ .

ztzhang avatar Oct 15 '19 18:10 ztzhang

Here are all the messages after I run ./build_toolbox.sh

Add -gencode to match all the GPU architectures you have.
Check 'https://en.wikipedia.org/wiki/CUDA#GPUs_supported' for list of architecture.
Check 'http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html' for GPU compilation based on architecture.
/home/guillermo/anaconda3/envs/shaperecon/bin/python
setup.sh: line 9: /home/guillermo/anaconda3/envs/shaperecon/bin:/home/guillermo/anaconda3/condabin:/home/guillermo/.local/bin:/home/guillermo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-9.2/bin: No such file or directory
nvcc -c -o calc_prob_kernel.cu.o calc_prob_kernel.cu -x cu -Xcompiler -fPIC -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/TH -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/THC -I /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src         -gencode arch=compute_30,code=sm_30         -gencode arch=compute_35,code=sm_35         -gencode arch=compute_52,code=sm_52         -gencode arch=compute_61,code=sm_61
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob
generating /tmp/tmpx30pui_m/_calc_prob_lib.c
setting the current directory to '/tmp/tmpx30pui_m'
running build_ext
building '_calc_prob_lib' extension
creating home
creating home/guillermo
creating home/guillermo/PycharmProjects
creating home/guillermo/PycharmProjects/Fluid_Research
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c _calc_prob_lib.c -o ./_calc_prob_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob.o -std=c99
gcc -pthread -shared -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -L/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_calc_prob_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob.o /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob_kernel.cu.o -o ./_calc_prob_lib.so
Add -gencode to match all the GPU architectures you have.
Check 'https://en.wikipedia.org/wiki/CUDA#GPUs_supported' for list of architecture.
Check 'http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html' for GPU compilation based on architecture.
/home/guillermo/anaconda3/envs/shaperecon/bin/python
setup.sh: line 17: /home/guillermo/anaconda3/envs/shaperecon/bin:/home/guillermo/anaconda3/condabin:/home/guillermo/.local/bin:/home/guillermo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-9.2/bin: No such file or directory
nvcc -c -o nnd_cuda.cu.o nnd_cuda.cu -x cu -Xcompiler -fPIC -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/TH -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/THC -I /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include        -gencode arch=compute_30,code=sm_30         -gencode arch=compute_35,code=sm_35         -gencode arch=compute_52,code=sm_52         -gencode arch=compute_61,code=sm_61
Including CUDA code.
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance
generating /tmp/tmp4vs_ng9i/_my_lib.c
setting the current directory to '/tmp/tmp4vs_ng9i'
running build_ext
building '_my_lib' extension
creating home
creating home/guillermo
creating home/guillermo/PycharmProjects
creating home/guillermo/PycharmProjects/Fluid_Research
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c _my_lib.c -o ./_my_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib_cuda.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib_cuda.o -std=c99
gcc -pthread -shared -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -L/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_my_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib_cuda.o /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/nnd_cuda.cu.o -o ./_my_lib.so
Add -gencode to match all the GPU architectures you have.
Check 'https://en.wikipedia.org/wiki/CUDA#GPUs_supported' for list of architecture.
Check 'http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html' for GPU compilation based on architecture.
/home/guillermo/anaconda3/envs/shaperecon/bin/python
setup.sh: line 9: /home/guillermo/anaconda3/envs/shaperecon/bin:/home/guillermo/anaconda3/condabin:/home/guillermo/.local/bin:/home/guillermo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-9.2/bin: No such file or directory
nvcc -c -o back_projection_kernel.cu.o back_projection_kernel.cu -x cu -Xcompiler -fPIC -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/TH -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/THC -I /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include        -gencode arch=compute_30,code=sm_30         -gencode arch=compute_35,code=sm_35         -gencode arch=compute_52,code=sm_52         -gencode arch=compute_61,code=sm_61
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp
generating /tmp/tmpymjotthd/_cam_bp_lib.c
setting the current directory to '/tmp/tmpymjotthd'
running build_ext
building '_cam_bp_lib' extension
creating home
creating home/guillermo
creating home/guillermo/PycharmProjects
creating home/guillermo/PycharmProjects/Fluid_Research
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c _cam_bp_lib.c -o ./_cam_bp_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.o -std=c99
gcc -pthread -shared -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -L/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_cam_bp_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.o /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection_kernel.cu.o -o ./_cam_bp_lib.so

dannygelman1 avatar Oct 15 '19 18:10 dannygelman1

Hello,

Thank you for looking into my issue! I just wanted to follow up on this and make sure I provided the messages you wanted to see. Are these the compile messages you wanted?

Also, I am an MIT undergraduate and trying to use this repo as part of my project in the Media Lab. I pass by CSAIL often and was wondering, if you are free, maybe we can meet in person to discuss the issue I am running into?

Thank you!

dannygelman1 avatar Oct 18 '19 16:10 dannygelman1

Sorry for the late reply, happy to chat! I can help with the issue if you can show me your setup as well!

ztzhang avatar Nov 19 '19 16:11 ztzhang

No worries! My supervisor @gbernal and I would be happy to chat with you! You are welcome to come by Fluid Interfaces in the Media Lab so we can show you our setup, or we can come by CSAIL if that's easier for you. What days/times are good for you?

dannygelman1 avatar Nov 19 '19 18:11 dannygelman1

Was there ever a resolution on this? I'm getting the same errors.

wagnew3 avatar Jan 15 '20 21:01 wagnew3

@weeoooweeooo I am getting the same errors. Did you get any solution to that?

colinqian avatar May 09 '20 07:05 colinqian

@colinqian Did not manage to get beyond these errors, despite attempts with suggested workarounds. The deprecations in pytorch 1.0 require some non-trivial changes in the code here it seems.

weeoooweeooo avatar May 11 '20 05:05 weeoooweeooo

I can get GenRe running on machines with CUDA 9.2 and pytorch 0.4.1. The key pieces are making sure I add the gpu arch specification to the setup.sh scripts in toolbox/, and setting these environment variables (modify as necessary for your machine):

export CPATH=$CPATH:/usr/local/cuda-9.2/include export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Installing pytorch 0.4.1 is itself non trivial anymore; besides the correct cuda version it requires specific gcc version, but I found installing using conda once I had these to be not too bad.

wagnew3 avatar May 11 '20 05:05 wagnew3

@wagnew3 It works now. I get it running with CUDA 9.0 and pytorch 0.4.1. I upgraded gcc to the lastest version and add some environment variables. Thank you.

colinqian avatar May 11 '20 11:05 colinqian

@colinqian Which version of GCC did you happen to update it to? I'm getting the same error, running with CUDA 9.0 and pytorch 0.4.1 as well.

hanseungwook avatar Aug 27 '20 16:08 hanseungwook