nvdiffrec icon indicating copy to clipboard operation
nvdiffrec copied to clipboard

Unable to run reproduce example, due to type error (unexpected indexing mode).

Open crud89 opened this issue 3 years ago • 10 comments

I am having trouble following the instructions from the readme file. Whenever I run the training for an example, I receive an error:

Traceback (most recent call last):
  File "C:\Users\Admin\source\repos\nvdiffrec\test\train.py", line 563, in <module>
    dataset_train    = DatasetMesh(ref_mesh, glctx, RADIUS, FLAGS, validate=False)
  File "C:\Users\Admin\source\repos\nvdiffrec\test\dataset\dataset_mesh.py", line 46, in __init__
    self.envlight = light.load_env(FLAGS.envmap, scale=FLAGS.env_scale)
  File "C:\Users\Admin\source\repos\nvdiffrec\test\render\light.py", line 141, in load_env
    return _load_env_hdr(fn, scale)
  File "C:\Users\Admin\source\repos\nvdiffrec\test\render\light.py", line 132, in _load_env_hdr
    cubemap = util.latlong_to_cubemap(latlong_img, [512, 512])
  File "C:\Users\Admin\source\repos\nvdiffrec\test\render\util.py", line 106, in latlong_to_cubemap
    gy, gx = torch.meshgrid(torch.linspace(-1.0 + 1.0 / res[0], 1.0 - 1.0 / res[0], res[0], device='cuda'),
TypeError: meshgrid() got an unexpected keyword argument 'indexing'

I am not sure if this is related, but I also received some warnings during installation, namely when attempting to install tinycudann:

Running setup.py install for tinycudann ...
    WARNING: Subprocess output does not appear to be encoded as cp1252

I then receive similar encoding warning every time I run the training script:

C:\Users\Admin\AppData\Roaming\Python39\site-packages\torch\utils\cpp_extension.py:304:
    UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0x81 in position 62: invalid start byte

Just in case:

  • I have CUDA 11.5 installed and first tried to manually install appropriate cuda toolkit using conda install cudatoolkit=11.5 -c conda-forge after which conda install pytorch torchvision torchaudio cudatoolkit=11.5 -c pytorch also succeeded, but none of this solved the issue.
  • GPU is RTX 2080 Ti
  • Windows 10 x64

Weirdly enough, removing the indexing parameter from all calls to torch.meshgrid() appears to fix the issue. The parameter defaults to ij anyway, but this would only work for current releases of pytorch, as the documentation states, that it will transition to xy as the default in the future. No idea why it works when removing the argument entirely, though.

I would be grateful for any hints. Thanks! 🙂

crud89 avatar Apr 06 '22 12:04 crud89

Thanks for the bug report. This is very strange. We explicitly added indexing='ij' to the torch.meshgrid() calls to remove the warnings (I assume the meshgrid warning reappeared for you when you removed the indexing argument?). We haven't seen this locally. Perhaps a file-encoding issue?

Two other things for building the PyTorch cuda extensions:

  • On Windows, we typically install cuda from https://developer.nvidia.com/cuda-toolkit, not through conda.
  • I assume Visual Studio 2019 is installed (given that it runs now)?

jmunkberg avatar Apr 06 '22 13:04 jmunkberg

Actually, I would change the pytorch commandline to conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch pytorch only supports cuda 11.3 . It is fine to have a more modern Cuda toolkit installed. I run with 11.5 and 11.6 locally.

jmunkberg avatar Apr 06 '22 13:04 jmunkberg

Thanks for the answer!

Indeed, I am having VS 2019 installed. I also installed the toolkit using the installer beforehand. I am not seeing any warnings on my side. I have downgraded pytorch at first (to 1.10), to see if it fixes the error (it didn't), but I have re-created everything (i.e. deleted the old and created a new environment) to see if removing the parameter also fixed the issue with version 1.11 and I am not receiving any warnings here. So no matter the pytorch version, there is no warning, if I remove the indexing parameter, but there is an error, if I leave it in. Still, I am not exactly sure what is going on, as I would have expected a warning myself.

The only warning is about the encoding, but I am not sure if this is relevant or even related.

image

crud89 avatar Apr 06 '22 13:04 crud89

Ok, this is strange. I'll try in a fresh conda env later to see if I can reproduce. I think I have the latest PyTorch 1.11, but they may have changed something very recently. Just to double-check, did you change the cuda version used by torch from conda install pytorch torchvision torchaudio cudatoolkit=11.5 -c pytorch (not recommended)

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch (our recommended setup, still while having the 11.5 toolkit installed)

jmunkberg avatar Apr 06 '22 14:04 jmunkberg

I tried both and both resulted the same behavior. Currently I am on cudatoolkit=11.3 as you recommended.

crud89 avatar Apr 06 '22 14:04 crud89

I just created a new conda env, and installed PyTorch 1.11.0 in there, following the installation steps in the readme exactly. Everything runs fine on my end, and I don't see the warning UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0x81 in position 62: invalid start byte. Sorry I cannot help you more. Perhaps try on another machine to see if that is an issue specific to the current desktop?

jmunkberg avatar Apr 07 '22 07:04 jmunkberg

Thanks for testing it!

I will see if I can test it on another system. So far, I am okay with manually fixing this issue. If I find something, I will let you know.

Thanks again! 🙂

crud89 avatar Apr 07 '22 11:04 crud89

I solve this problem :)

Screenshot from 2022-06-17 18-32-51

The latest version of pytorch (above 1.9.0) does not support this parameter anymore. Only:

torch.meshgrid(*tensors)

Although this parameter is missing, the default is still indexing='ij'

So we just need to delete the indexing part of the error line

jiayaozhang avatar Jun 17 '22 10:06 jiayaozhang

The latest version of pytorch (above 1.9.0) does not support this parameter anymore.

It does not support this parameter yet. This is because it has only been added in version 1.10.0.

So unfortunately this is not a solution, because removing it will currently work but wont for future releases, where the default parameter will change from ij to xy.

crud89 avatar Jun 17 '22 11:06 crud89

Thanks @crud89 ,

Yes, we added the indexing parameter to remove the warning in PyTorch 1.10+

jmunkberg avatar Jun 17 '22 12:06 jmunkberg