HyperPose icon indicating copy to clipboard operation
HyperPose copied to clipboard

how to train the model?

Open yxxxqqq opened this issue 5 years ago • 8 comments

Train
  • Environment

    I want to train the model, but I meet some problems when creating environment.

    • 'requirements.txt'

      tensorflow>=1.14.0,<2.0.0
      tensorlayer==1.11.1
      
    • error

      ERROR: tensorflow 1.14.0 has requirement wrapt>=1.11.1, but you'll have wrapt 1.10.11 which is incompatible.

    • 'tensorflow1.14.0' and 'tensorlayer1.11.1' are not compatible The code seems use tensorlayer2.x.x ?

    • another error:

      AttributeError: module 'tensorlayer.layers' has no attribute 'BatchNorm2d'

  • So I don't know the correct environmental dependencies

yxxxqqq avatar Jun 12 '20 06:06 yxxxqqq

Hi! I'm sorry to let it bother you. I forget to change the requirements.txt of the root directory. There is a Documentation link https://hyperpose.readthedocs.io/en/latest/ in the main page, and we provide detailed installation guide, quick start and tutorial there, you can follow the environment configuration of "Training Library Installation" page https://hyperpose.readthedocs.io/en/latest/markdown/install/training.html.

The configuration we have tested is: 1.tensorflow==2.0.0 2.the newest tensorlayer (use "pip install git+https://github.com/tensorlayer/tensorlayer.git") 3.numpy==1.16.4 4.cudatoolkit=10.0.130 5.cudnn=7.6.0 6.pycocotools 7.opencv-python

I'll fix the requirements.txt soon, If you have any problem following the "Training installation guide" to configure the environment, please contact me! As for the training procudure, you can follow the tutorial, or take the train.py for reference. Thanks

Gyx-One avatar Jun 12 '20 15:06 Gyx-One

tensorlayer should be > 2.0.0 ?

zsdonghao avatar Jun 13 '20 02:06 zsdonghao

https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt @Gyx-One like that?

zsdonghao avatar Jun 13 '20 02:06 zsdonghao

@Gyx-One thank you very much, I have configure the environment correctly, but I get another error when I training:

hyperpose/Hyperpose/Model/openpose/train.py:195 one_step  *
        pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py:457 __call__
        result = self._call(*args, **kwds)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward  *
        vgg_features=self.backbone.forward(x)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:129 forward  *
        x=self.main_block.forward(x)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/models/vgg.py:108 forward  *
        out = self.layers(inputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:245 __call__  *
        self._add_node(input_tensors, outputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:271 _add_node  *
        in_nodes = [tensor._info[0] for tensor in inputs_list]

    AttributeError: 'Tensor' object has no attribute '_info'
  • So, what's the problem? very sorry to bother you!

yxxxqqq avatar Jun 14 '20 07:06 yxxxqqq

https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt @Gyx-One like that?

Yes! still need libraries below:

  1. "pycocotools" is required to read coco dataset.
  2. "scipy" is required to read the .mat format file of MPII dataset.
  3. I tried and it seems that pip can't be used to isntall cudatoolkit and cudnn. so either user should configure a right cuda version, or they use conda to install cudatoolkit and cudnn in virtual environment

Gyx-One avatar Jun 18 '20 10:06 Gyx-One

@Gyx-One thank you very much, I have configure the environment correctly, but I get another error when I training:

hyperpose/Hyperpose/Model/openpose/train.py:195 one_step  *
        pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py:457 __call__
        result = self._call(*args, **kwds)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward  *
        vgg_features=self.backbone.forward(x)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:129 forward  *
        x=self.main_block.forward(x)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/models/vgg.py:108 forward  *
        out = self.layers(inputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:245 __call__  *
        self._add_node(input_tensors, outputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:271 _add_node  *
        in_nodes = [tensor._info[0] for tensor in inputs_list]

    AttributeError: 'Tensor' object has no attribute '_info'
  • So, what's the problem? very sorry to bother you!

sorry I've been busy these days and just have time to fix this issue, the problem is due to that tensorlayer have some small modifications to make to become compatible with hyperpose, and I stucked at testing and merging the modifications into tensorlayer master several days ago.

today the merging in tensorlayer succeed, to solve this problem, you can just uninstall tensorlayer and reinstall it from the newest github again: 1."conda activate hyperpose" 2."pip uninstall tensorlayer" 3."pip install git+https://github.com/tensorlayer/tensorlayer.git" I'm very sorry about the inconvience that this issue brought to you. if you have any problem, please contact me! Thanks!

Gyx-One avatar Jun 18 '20 10:06 Gyx-One

@Gyx-One It works on my local host, and I have train successfully. But I train it on docker environment, there are some errors like that:

ValueError: in converted code:

/hyperpose/Hyperpose/Model/openpose/train.py:195 one_step  *
    pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward  *
        vgg_features=self.backbone.forward(x)
/hyperpose/Hyperpose/Model/backbones.py:355 forward  *
    x=self.bn1.forward(x)

ValueError: Cannot reshape a tensor with 64 elements to shape [1,1,1] (1 elements) for 'batchnorm/Reshape' (op: 'Reshape') with input shapes: [1,64,1,1], [3] and with input tensors computed as partial shapes: input[1] = [1,1,1].

very sorry to bother again and again!

yxxxqqq avatar Jun 24 '20 02:06 yxxxqqq

Thanks for pointing out this problem! I haven't test training on docker environment before, I'll find the reason soon! (I gauss it may be the version problem of tensorflow or tensorlayer in the docker environment, try to exam wether the environment in the docker and your local host are the same may work)

Gyx-One avatar Jul 02 '20 07:07 Gyx-One