how to train the model?
Train
-
Environment
I want to train the model, but I meet some problems when creating environment.
-
'requirements.txt'
tensorflow>=1.14.0,<2.0.0 tensorlayer==1.11.1 -
error
ERROR: tensorflow 1.14.0 has requirement wrapt>=1.11.1, but you'll have wrapt 1.10.11 which is incompatible.
-
'tensorflow1.14.0' and 'tensorlayer1.11.1' are not compatible The code seems use tensorlayer2.x.x ?
-
another error:
AttributeError: module 'tensorlayer.layers' has no attribute 'BatchNorm2d'
-
-
So I don't know the correct environmental dependencies
Hi! I'm sorry to let it bother you. I forget to change the requirements.txt of the root directory. There is a Documentation link https://hyperpose.readthedocs.io/en/latest/ in the main page, and we provide detailed installation guide, quick start and tutorial there, you can follow the environment configuration of "Training Library Installation" page https://hyperpose.readthedocs.io/en/latest/markdown/install/training.html.
The configuration we have tested is: 1.tensorflow==2.0.0 2.the newest tensorlayer (use "pip install git+https://github.com/tensorlayer/tensorlayer.git") 3.numpy==1.16.4 4.cudatoolkit=10.0.130 5.cudnn=7.6.0 6.pycocotools 7.opencv-python
I'll fix the requirements.txt soon, If you have any problem following the "Training installation guide" to configure the environment, please contact me! As for the training procudure, you can follow the tutorial, or take the train.py for reference. Thanks
tensorlayer should be > 2.0.0 ?
https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt @Gyx-One like that?
@Gyx-One thank you very much, I have configure the environment correctly, but I get another error when I training:
hyperpose/Hyperpose/Model/openpose/train.py:195 one_step *
pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
.virtualenv/tf2-py37/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py:457 __call__
result = self._call(*args, **kwds)
hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward *
vgg_features=self.backbone.forward(x)
hyperpose/Hyperpose/Model/openpose/model/openpose.py:129 forward *
x=self.main_block.forward(x)
.virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/models/vgg.py:108 forward *
out = self.layers(inputs)
.virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:245 __call__ *
self._add_node(input_tensors, outputs)
.virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:271 _add_node *
in_nodes = [tensor._info[0] for tensor in inputs_list]
AttributeError: 'Tensor' object has no attribute '_info'
- So, what's the problem? very sorry to bother you!
https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt @Gyx-One like that?
Yes! still need libraries below:
- "pycocotools" is required to read coco dataset.
- "scipy" is required to read the .mat format file of MPII dataset.
- I tried and it seems that pip can't be used to isntall cudatoolkit and cudnn. so either user should configure a right cuda version, or they use conda to install cudatoolkit and cudnn in virtual environment
@Gyx-One thank you very much, I have configure the environment correctly, but I get another error when I training:
hyperpose/Hyperpose/Model/openpose/train.py:195 one_step * pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True) .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py:457 __call__ result = self._call(*args, **kwds) hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward * vgg_features=self.backbone.forward(x) hyperpose/Hyperpose/Model/openpose/model/openpose.py:129 forward * x=self.main_block.forward(x) .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/models/vgg.py:108 forward * out = self.layers(inputs) .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:245 __call__ * self._add_node(input_tensors, outputs) .virtualenv/tf2-py37/lib/python3.7/site-packages/tensorlayer/layers/core.py:271 _add_node * in_nodes = [tensor._info[0] for tensor in inputs_list] AttributeError: 'Tensor' object has no attribute '_info'
- So, what's the problem? very sorry to bother you!
sorry I've been busy these days and just have time to fix this issue, the problem is due to that tensorlayer have some small modifications to make to become compatible with hyperpose, and I stucked at testing and merging the modifications into tensorlayer master several days ago.
today the merging in tensorlayer succeed, to solve this problem, you can just uninstall tensorlayer and reinstall it from the newest github again: 1."conda activate hyperpose" 2."pip uninstall tensorlayer" 3."pip install git+https://github.com/tensorlayer/tensorlayer.git" I'm very sorry about the inconvience that this issue brought to you. if you have any problem, please contact me! Thanks!
@Gyx-One It works on my local host, and I have train successfully. But I train it on docker environment, there are some errors like that:
ValueError: in converted code:
/hyperpose/Hyperpose/Model/openpose/train.py:195 one_step *
pd_conf,pd_paf,stage_confs,stage_pafs=train_model.forward(image,is_train=True)
hyperpose/Hyperpose/Model/openpose/model/openpose.py:49 forward *
vgg_features=self.backbone.forward(x)
/hyperpose/Hyperpose/Model/backbones.py:355 forward *
x=self.bn1.forward(x)
ValueError: Cannot reshape a tensor with 64 elements to shape [1,1,1] (1 elements) for 'batchnorm/Reshape' (op: 'Reshape') with input shapes: [1,64,1,1], [3] and with input tensors computed as partial shapes: input[1] = [1,1,1].
very sorry to bother again and again!
Thanks for pointing out this problem! I haven't test training on docker environment before, I'll find the reason soon! (I gauss it may be the version problem of tensorflow or tensorlayer in the docker environment, try to exam wether the environment in the docker and your local host are the same may work)