bonnetal int64 support for some operations not supported

int64 support for some operations not supported

Open ryx2 opened this issue 4 years ago • 3 comments

I have installed all the pip packages in a venv, and when I pip list, everything matches up. I also installed pytorch from source. When I attempt to run

python3 train.py -c /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/ mobilenetv2_test.yaml --log /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log -p /dev/null

INTERFA

CE: config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log model path /dev/null eval only False No batchnorm False

Commit hash (training version): b'5368eed'

Opening config file /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml model folder doesnt exist! Start with random weights... Copying files to /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log for further reference. Images from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/train/img Labels from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/train/lbl Inference batch size: 3 Images from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/valid/img Labels from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/valid/lbl Original OS: 32 New OS: 16.0 [Decoder] os: 8 in: 32 skip: 32 out: 32 [Decoder] os: 4 in: 32 skip: 24 out: 24 [Decoder] os: 2 in: 24 skip: 16 out: 16 [Decoder] os: 1 in: 16 skip: 3 out: 16 Using normalized weights as bias for head.

Couldn't load backbone, using random weights. Error: [Errno 20] Not a directory: '/dev/null/backbone' Couldn't load decoder, using random weights. Error: [Errno 20] Not a directory: '/dev/null/segmentation_decoder' Couldn't load head, using random weights. Error: [Errno 20] Not a directory: '/dev/null/segmentation_head' Total number of parameters: 2154794 Total number of parameters requires_grad: 2154794 Param encoder 1812800 Param decoder 341960 Param head 34 Training in device: cuda /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/bonnetal/lib/python3.5/site-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) [IOU EVAL] IGNORE: tensor([], dtype=torch.int64) [IOU EVAL] INCLUDE: tensor([0, 1]) Traceback (most recent call last): File "train.py", line 118, in trainer.train() File "../../tasks/segmentation/modules/trainer.py", line 302, in train scheduler=self.scheduler) File "../../tasks/segmentation/modules/trainer.py", line 494, in train_epoch evaluator.addBatch(output.argmax(dim=1), target) File "../../tasks/segmentation/modules/ioueval.py", line 42, in addBatch tuple(idxs), self.ones, accumulate=True) RuntimeError: "embedding_backward" not implemented for 'Long'

Nov 13 '19 15:11 ryx2

I should also include my yaml file:

# training parameters
train:
  loss: "xentropy"       # must be either xentropy or iou
  max_epochs: 300
  max_lr: 0.01           # sgd learning rate max
  min_lr: 0.001          # warmup initial learning rate
  up_epochs: 0.5         # warmup during first XX epochs (can be float)
  down_epochs:  30       # warmdown during second XX epochs  (can be float)
  max_momentum: 0.9      # sgd momentum max when lr is mim
  min_momentum: 0.85     # sgd momentum min when lr is max
  final_decay: 0.995     # learning rate decay per epoch after initial cycle (from min lr)
  w_decay: 0.0005        # weight decay
  batch_size: 5          # batch size
  report_batch: 1        # every x batches, report loss
  report_epoch: 1        # every x epochs, report validation set
  save_summary: False    # Summary of weight histograms for tensorboard
  save_imgs: True        # False doesn't save anything, True saves some 
                         # sample images (one per batch of the last calculated batch)
                         # in log folder
  avg_N: 3               # average the N best models
  crop_prop:
    height: 480
    width: 480

# backbone parameters
backbone:
  name: "mobilenetv2"
  dropout: 0.02
  dropout: 0.02
  bn_d: 0.05
  OS: 16 # output stride
  train: True # train backbone?
  extra:
    width_mult: 1.0
    shallow_feats: True # get features before the last layer (mn2)

decoder:
  name: "aspp_progressive"
  dropout: 0.02
  bn_d: 0.05
  train: True # train decoder?
  extra:
    aspp_channels: 32
    last_channels: 16

# classification head parameters
head:
  name: "segmentation"
  dropout: 0.1

# dataset (to find parser)
dataset:
  name: "persons"
  location: "/tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/"
  workers: 3 # number of threads to get data
  img_means: #rgb
    - 0.46992042
    - 0.45250652
    - 0.42510188
  img_stds: #rgb
    - 0.29184756
    - 0.28221624
    - 0.29719201
  img_prop:
    width: 640
    height: 480
    depth: 3
  labels:
    0: 'background'
    1: 'person'
  labels_w:
    0: 1.0
    1: 1.0
  color_map: # bgr
    0: [0,0,0]
    1: [0,255,0]

Where the imgs and lbl's in that dataset folder are float32's and uint8, respectively

Nov 13 '19 17:11 ryx2

Hi,

Were you able to resolve this issue? I am having the exact same issue when using my docker but it worked on the bonnetal docker, For both I use the exact same dataset and config files

Apr 22 '20 17:04 duda1202

@duda1202 i was able to get this to work, it's a versioning problem. I forget which version changes made it work since this was months ago, but i just pasted my pip freeze here.

`Package Version

absl-py 0.8.1
appdirs 1.4.3
astor 0.8.0
backcall 0.1.0
cycler 0.10.0
decorator 4.4.1
gast 0.3.2
genpy 2016.1.3
grpcio 1.25.0
h5py 2.10.0
imageio 2.6.1
imgaug 0.3.0
ipdb 0.12.3
ipython 7.9.0
ipython-genutils 0.2.0
jedi 0.15.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
Mako 1.1.0
Markdown 3.1.1
MarkupSafe 1.1.1
matplotlib 3.0.3
mock 3.0.5
networkx 2.4
numpy 1.17.4
onnx 1.5.0
opencv-python 3.4.0.12
opencv-python-headless 4.1.2.30
parso 0.5.1
pexpect 4.7.0
pickleshare 0.7.5
Pillow 6.0.0
pip 19.3.1
pkg-resources 0.0.0
prompt-toolkit 2.0.10
protobuf 3.10.0
ptyprocess 0.6.0
pycuda 2019.1.2
Pygments 2.5.2
pyparsing 2.4.5
python-dateutil 2.8.1
pytools 2019.1.1
PyWavelets 1.1.1
PyYAML 5.1
scikit-image 0.15.0
scikit-learn 0.20.3
scipy 0.19.1
setuptools 20.7.0
Shapely 1.6.4.post2 six 1.13.0
tensorboard 1.13.1
tensorflow 1.13.1
tensorflow-estimator 1.13.0
termcolor 1.1.0
torch 1.3.1
torchvision 0.4.2
traitlets 4.3.3
typing 3.7.4.1
typing-extensions 3.7.4.1
wcwidth 0.1.7
Werkzeug 0.16.0
wheel 0.33.6 `

Apr 22 '20 22:04 ryx2

bonnetal bonnetal copied to clipboard

int64 support for some operations not supported

CE: config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log model path /dev/null eval only False No batchnorm False

Commit hash (training version): b'5368eed'

bonnetal
bonnetal copied to clipboard