PaddleSeg icon indicating copy to clipboard operation
PaddleSeg copied to clipboard

[General Issue] Problem when training the PaddleSeg Matting Model

Open omarsemma opened this issue 2 years ago • 4 comments

Enviroment

  1. PaddleSeg version: release/2.5
  2. PaddlePaddle version: PaddlePaddle 2.3.1 (gpu)
  3. Operation system: Linux
  4. Python version: Python 3.7
  5. CUDA/cuDNN version: CUDA11.1/cuDNN 7.6
  6. Using Google Colab

Issue

I tried to train the PaddleSeg Matting Model on a custom dataset. The training went on normally until the 1000th iteration where I got the following error :

2022-07-06 21:11:16 [INFO]	[TRAIN] epoch=167, iter=1000/100000, loss=1.0582, lr=0.020000, batch_cost=2.3336, reader_cost=1.07924, ips=6.8565 samples/sec | ETA 64:10:22
2022-07-06 21:11:16 [INFO]	[TRAIN] [LOSS] all=1.0582 semantic=0.0198 detail=0.9192 fusion=0.1192 fusion_l1=0.0808 fusion_comp=0.0347 fusion_con_sem=0.0037
Traceback (most recent call last):
  File "train.py", line 174, in <module>
    main(args)
  File "train.py", line 169, in main
    eval_begin_iters=args.eval_begin_iters)
  File "/content/PaddleSeg/Matting/core/train.py", line 223, in train
    log_writer=log_writer, vis_dict=vis_dict, step=iter)
  File "/content/PaddleSeg/Matting/core/train.py", line 54, in visual_in_traning
    log_writer.add_image(tag=key, img=value, step=step)
  File "/usr/local/lib/python3.7/dist-packages/visualdl/writer/writer.py", line 217, in add_image
    dataformats=dataformats))
  File "/usr/local/lib/python3.7/dist-packages/visualdl/component/base_component.py", line 171, in image
    image_bytes = imgarray2bytes(image_array)
  File "/usr/local/lib/python3.7/dist-packages/visualdl/component/base_component.py", line 74, in imgarray2bytes
    img_bin = Image.fromarray(np.uint8(buf)).tobytes("raw")
  File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2728, in fromarray
    size = shape[1], shape[0]
IndexError: tuple index out of range

Can someone help me out with this issue please ? Thanks 😃

omarsemma avatar Jul 07 '22 12:07 omarsemma

可以根据教程先跑一下流程试试,如果教程能跑通说明是数据的问题哈

wuyefeilin avatar Jul 08 '22 02:07 wuyefeilin

Thanks for your reply,

I've just tried it on the PPM-100 dataset and I still get the same error on the 1000th iteration.

omarsemma avatar Jul 08 '22 09:07 omarsemma

What about develop branch. Does it have the same problem?

wuyefeilin avatar Jul 12 '22 02:07 wuyefeilin

Got exactly the same error with the develop branch. I also tried training without using the VisualDL(v 2.3.0) argument and it worked.

omarsemma avatar Jul 12 '22 16:07 omarsemma

I can not repeat you problem. Do you change the code.

wuyefeilin avatar Aug 11 '22 10:08 wuyefeilin

What about develop branch. Does it have the same problem?

您好,我遇到了同样的问题,使用教程中提供的 PPM-100 数据集训练时在 iter=1000 报错,经过排查这个问题出现在 vdl 过程中 (Matting/ppmatting/core/train.py 255 行),调用多层进行至 imgarray2bytes 方法时,图中第 94 行导致图片的shape由 (512, 512,3)变为 (306912,), 此后 fromarray 方法中调用 shape[1] 时出错,调用堆栈可参考下图,希望能得到您的关注,谢谢~

image

unihornWwan avatar Feb 07 '23 02:02 unihornWwan