HyperPose how to train the model?

Train

Environment

I want to train the model, but I meet some problems when creating environment.
- 'requirements.txt'
```
tensorflow>=1.14.0,<2.0.0
tensorlayer==1.11.1
```
- error
  
  ERROR: tensorflow 1.14.0 has requirement wrapt>=1.11.1, but you'll have wrapt 1.10.11 which is incompatible.
- 'tensorflow1.14.0' and 'tensorlayer1.11.1' are not compatible The code seems use tensorlayer2.x.x ?
- another error:
  
  AttributeError: module 'tensorlayer.layers' has no attribute 'BatchNorm2d'
So I don't know the correct environmental dependencies

Jun 12 '20 06:06 yxxxqqq

Hi! I'm sorry to let it bother you. I forget to change the requirements.txt of the root directory. There is a Documentation link https://hyperpose.readthedocs.io/en/latest/ in the main page, and we provide detailed installation guide, quick start and tutorial there, you can follow the environment configuration of "Training Library Installation" page https://hyperpose.readthedocs.io/en/latest/markdown/install/training.html.

The configuration we have tested is: 1.tensorflow==2.0.0 2.the newest tensorlayer (use "pip install git+https://github.com/tensorlayer/tensorlayer.git") 3.numpy==1.16.4 4.cudatoolkit=10.0.130 5.cudnn=7.6.0 6.pycocotools 7.opencv-python

I'll fix the requirements.txt soon, If you have any problem following the "Training installation guide" to configure the environment, please contact me! As for the training procudure, you can follow the tutorial, or take the train.py for reference. Thanks

Jun 12 '20 15:06 Gyx-One

tensorlayer should be > 2.0.0 ?

Jun 13 '20 02:06 zsdonghao

https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt @Gyx-One like that?

Jun 13 '20 02:06 zsdonghao

@Gyx-One thank you very much, I have configure the environment correctly, but I get another error when I training:

hyperpose/Hyperpose/Model/openpose/train.py:195 one_step  *
        pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py:457 __call__
        result = self._call(*args, **kwds)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward  *
        vgg_features=self.backbone.forward(x)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:129 forward  *
        x=self.main_block.forward(x)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/models/vgg.py:108 forward  *
        out = self.layers(inputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:245 __call__  *
        self._add_node(input_tensors, outputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:271 _add_node  *
        in_nodes = [tensor._info[0] for tensor in inputs_list]

    AttributeError: 'Tensor' object has no attribute '_info'

So, what's the problem? very sorry to bother you!

Jun 14 '20 07:06 yxxxqqq

https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt @Gyx-One like that?

Yes! still need libraries below:

"pycocotools" is required to read coco dataset.
"scipy" is required to read the .mat format file of MPII dataset.
I tried and it seems that pip can't be used to isntall cudatoolkit and cudnn. so either user should configure a right cuda version, or they use conda to install cudatoolkit and cudnn in virtual environment

Jun 18 '20 10:06 Gyx-One

@Gyx-One thank you very much, I have configure the environment correctly, but I get another error when I training:

hyperpose/Hyperpose/Model/openpose/train.py:195 one_step  *
        pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py:457 __call__
        result = self._call(*args, **kwds)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward  *
        vgg_features=self.backbone.forward(x)
    hyperpose/Hyperpose/Model/openpose/model/openpose.py:129 forward  *
        x=self.main_block.forward(x)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/models/vgg.py:108 forward  *
        out = self.layers(inputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:245 __call__  *
        self._add_node(input_tensors, outputs)
    .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:271 _add_node  *
        in_nodes = [tensor._info[0] for tensor in inputs_list]

    AttributeError: 'Tensor' object has no attribute '_info'

So, what's the problem? very sorry to bother you!

sorry I've been busy these days and just have time to fix this issue, the problem is due to that tensorlayer have some small modifications to make to become compatible with hyperpose, and I stucked at testing and merging the modifications into tensorlayer master several days ago.

today the merging in tensorlayer succeed, to solve this problem, you can just uninstall tensorlayer and reinstall it from the newest github again: 1."conda activate hyperpose" 2."pip uninstall tensorlayer" 3."pip install git+https://github.com/tensorlayer/tensorlayer.git" I'm very sorry about the inconvience that this issue brought to you. if you have any problem, please contact me! Thanks!

Jun 18 '20 10:06 Gyx-One

@Gyx-One It works on my local host, and I have train successfully. But I train it on docker environment, there are some errors like that:

ValueError: in converted code:

/hyperpose/Hyperpose/Model/openpose/train.py:195 one_step  *
    pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward  *
        vgg_features=self.backbone.forward(x)
/hyperpose/Hyperpose/Model/backbones.py:355 forward  *
    x=self.bn1.forward(x)

ValueError: Cannot reshape a tensor with 64 elements to shape [1,1,1] (1 elements) for 'batchnorm/Reshape' (op: 'Reshape') with input shapes: [1,64,1,1], [3] and with input tensors computed as partial shapes: input[1] = [1,1,1].

very sorry to bother again and again!

Jun 24 '20 02:06 yxxxqqq

Thanks for pointing out this problem! I haven't test training on docker environment before, I'll find the reason soon! (I gauss it may be the version problem of tensorflow or tensorlayer in the docker environment, try to exam wether the environment in the docker and your local host are the same may work)

Jul 02 '20 07:07 Gyx-One