FastMaskRCNN TypeError: long() argument must be a string or a number, not 'JpegImageFile'

when I run :python download_and_convert_data.py `>> Converting image 23751/82783 shard 11

Converting image 23801/82783 shard 11 Converting image 23851/82783 shard 11 None Annotations data/coco/train2014/COCO_train2014_000000167118.jpg Traceback (most recent call last): File "download_and_convert_data.py", line 36, in tf.app.run() File "/mnt/data1/daniel/tensorflow/_python_build/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "download_and_convert_data.py", line 30, in main download_and_convert_coco.run(FLAGS.dataset_dir) File "/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py", line 338, in run 'train2014') File "/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py", line 299, in _add_to_tfrecord img = img.astype(np.uint8) TypeError: long() argument must be a string or a number, not 'JpegImageFile'`

why this happened?

Apr 25 '17 10:04 firestonelib

Can you try changing the line 292 in /libs/datasets/download_and_convert_coco.py

img = np.array(Image.open(img_name))

to

img = np.asarray(Image.open(img_name)) ?

This might not be the solution, it is just a lucky guess

Apr 25 '17 11:04 kevinkit

@kevinkit, unfortunately, guess does not help.

Apr 25 '17 14:04 vaklyuenkov

Thanks,@kevinkit, I’ve tried， but didn’t work. anybody help us?@CharlesShang

Apr 25 '17 14:04 firestonelib

The error comes from the fact that if you open a file with

Image.open(...)

It creates an Image object which is from the PIL-library, which is called 'JpegImageFile' . The command img.astype(np.uint8) refers to a numpy array. So it happens because at the point this command is called img is not a numpy array but a JpegImagFile

Apr 25 '17 16:04 kevinkit

I solved that problem.

Apr 25 '17 16:04 firestonelib

@Designbook1, please, tell how?

Apr 25 '17 16:04 vaklyuenkov

@vaklyuenkov I just comment img.astype(np.uint8), and that works. Img.astype(np.uint8) returns a copy of the array converted to the uint8 type, that will speed up the training time. If you comment that line, it works, but the trainning time will very slow,for example, with one GTX1080, it takes 3 hours to train 21k iterations, but it seems need to train 1500k iterations. which means it will take more than 7 DAYS to train the model！ So we need another way to solve it. I hope Anybody who solved that problem could tell us, thanks very much!

Apr 26 '17 03:04 firestonelib

@kevinkit your answer is right, but how can I do to solve it? it means I need get the img's data?detail?

Apr 26 '17 04:04 firestonelib

@Designbook1, thank you.

Apr 26 '17 09:04 vaklyuenkov

@Designbook1 I would love to tell you but I am not able to reproduce the problem. You can give the command np.array(Image.open(img_name)) a dataype so it should convert it automatically when opening the image with: img = np.array(Image.open(img_name),dtype=np.uint8)

However I am not quite sure if this will speed up the training process. In the original MASK-RCNN paper they trained for 2 days on a 8 GPU system. Therefore 7 days seems like an "okay" time to train for this architecture, when only one GPU is available.

Furthermore, the datatype may get converted when going on the GPU to a suitable representation (int16 / int8 - if your GPU does support it)

regarding speed / training see: #31

Apr 26 '17 11:04 kevinkit

@Designbook1 Sounds like the input JPEG image is corrupted (possibly just truncated). The imaging library can identify the file type as JPEG, but it can't read the data, so it can't convert the JpegImageFile into a numpy array.

Compare:

>>> im = Image.open("image.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x1086AFF50>
>>> np.array(im, dtype=np.uint8)
array([[[105, 100,  94],
        [107, 102,  96],
        [109, 104,  98],
        ...

versus

>>> im = Image.open("truncated.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x107073810>
>>> np.array(im, dtype=np.uint8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: long() argument must be a string or a number, not 'JpegImageFile'

May be easiest just re-downloading the images. If you're feeling particularly motivated you could use (say) djpeg to find which images are broken.

Apr 26 '17 11:04 tweedmorris

A nice feature request would be a checksum to check if the downloaded data is ok.

Apr 26 '17 12:04 kevinkit

@kevinkit Yeah, I guess, but it's pretty easy to run md5sum on the downloads. I'd prefer the project to focus on the tricky groundbreaking stuff :-)

For what it's worth, I report the following MD5 sums for the big zip files:

5750999c8c964077e3c81581170be65b  captions_train-val2014.zip
59582776b8dd745d649cd249ada5acf7  instances_train-val2014.zip
926b9df843c698817ee62e0e049e3753  person_keypoints_trainval2014.zip
0da8c0bd3d6becc4dcb32757491aca88  train2014.zip
a3d79f5ed8d289b7a7554ce06a5782b3  val2014.zip

Apr 26 '17 12:04 tweedmorris

@tweedmorris It seems your MD5 sums are the same with mine. So I think there is no problem with the files.

Apr 26 '17 12:04 Kongsea

An idea for further investigation:

Try to replace Image.open with cv2.imread, to do so you must install opencv (pip install opencv-python)

If this fixes the error there is an issue with PIL alternatively you could also try it with scipy to load the images

If this does not help make a try except block around the code and check which images give you the exception, either if it is all images or just some or even just one image.
Update all libs etc.

Apr 27 '17 12:04 kevinkit

@kevinkit Good idea about the cv2.imread – be aware though that OpenCV will load the image in BGR byte order rather than RGB, so you'll need to switch the byte order, and perhaps make a copy of the re-strided array just in case there are any downstream functions that are expecting contiguous data:

img = cv2.imread("image.jpg").astype(np.uint8)[:,:,::-1]
img = np.ascontiguousarray(img)

Note that although OpenCV does provide a separate image I/O library, the imread/imsave functions in scipy are just fairly thin wrappers around the PIL/Pillow library so you would expect the same error as before:

>>> from scipy.misc import imread
>>> imread("truncated.jpg")
array(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x11CB5FFD0>, dtype=object)

@Designbook1 Hope you've solved the problem, please do let us know.

Apr 27 '17 13:04 tweedmorris

Could it be that the EXIF data is corrupted? I had a similar issue with some images the other day. It turned out the EXIF data was corrupted. I was able to fix it by clearing out the EXIF data using piexif.

import piexif piexif.remove("pathtofile")

Apr 27 '17 14:04 jjgarrett0

@kevinkit @tweedmorris your answers have helped me to solve the problem. Thank you very much!

May 03 '17 01:05 firestonelib

@Designbook1 I want to know how to solve the problem,thank you.

May 04 '17 01:05 Cherry2410

I also have this problem, and the file is different from @Designbook1 (it's around 70,000 and his is around 23851). I don't believe the downloaded file has problems, but during the unzip process my disk is full so I have to switch to another disk which might cause some of the images to be corrupted - so I just unzip the data again and it looks good now.

Jul 29 '17 02:07 xiaoyongzhu

I had same problem too, but i tried deleted the data flie and properly unzip(one by one) and its work. Thank you, @xiaoyongzhu

Nov 08 '17 08:11 WKChung1028

Please tell me how to solve the problem,thank you very much. @Designbook1

Dec 25 '17 06:12 ty0803

@ty0803 u try to delete the unzipped files.

then unzip all files one by one again . It works for me.

Dec 25 '17 10:12 WKChung1028

@WKChung1028 OK!I try it.Thank you!

Dec 25 '17 12:12 ty0803

you can remove this image by follow code: img = Image.open('img_path') img1 = np.array(img).shape if len(img1) == 3: ....... else : continue this code maybe work.

Feb 24 '21 09:02 zhekouguo

FastMaskRCNN FastMaskRCNN copied to clipboard

TypeError: long() argument must be a string or a number, not 'JpegImageFile'

FastMaskRCNN
FastMaskRCNN copied to clipboard