InsightFace-v2 icon indicating copy to clipboard operation
InsightFace-v2 copied to clipboard

Trained models python / pytorch version?

Open noamgat opened this issue 5 years ago • 1 comments

I tried to use train.py with the BEST_checkpoint_r18.tar as the starting checkpoint, and got the following error:


  File "/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/data1/noamgat/InsightFace_v2/models.py", line 355, in forward
    output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
RuntimeError: CUDA error: device-side assert triggered

What does this mean? Might be connected to to an warnings:

/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.parallel.data_parallel.DataParallel' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.batchnorm.BatchNorm2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.activation.PReLU' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.pooling.MaxPool2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.pooling.AdaptiveAvgPool2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.linear.Linear' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.activation.Sigmoid' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/noamgat/hdd/miniconda3/envs/insightface/lib/python3.6/site-packages/torch/serialization.py:493: SourceChangeWarning: source code of class 'torch.nn.modules.batchnorm.BatchNorm1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

For reference, this is the conda env I set up, to most closely recreate what I saw in the README, requirements.txt and code:

name: insightface
channels:
  - pytorch
  - defaults
  - conda-forge
  - menpo
dependencies:
  - python=3.6.8
  - pytorch=1.3.0
  - matplotlib
  - scipy
  - tqdm
  - opencv
  - pillow
  - torchvision
  - numpy
  - scikit-image
  - imgaug
  - pip
  - tensorboard
  - pandas
  - pip:
      - torchsummary
      - git+https://github.com/Tramac/torchscope.git


Perhaps the pretrained models were trained on a different version than stated in the README ? Has anyone been able to get train.py working with the pretrained versions as the starting checkpoint?

noamgat avatar Aug 14 '20 13:08 noamgat

I have. in my case I had used pytorch 1.4.0 and 1.5.1 successfully without any issues. I haven't tested with 1.3 though! also 1.6.0 didnt work for me either, after couple of epochs, I'd get nans. I guess this is becasue of the breaking changes introduced in 1.6.0.

Coderx7 avatar Oct 19 '20 10:10 Coderx7