FastMaskRCNN
FastMaskRCNN copied to clipboard
TypeError: long() argument must be a string or a number, not 'JpegImageFile'
when I run :python download_and_convert_data.py `>> Converting image 23751/82783 shard 11
Converting image 23801/82783 shard 11 Converting image 23851/82783 shard 11 None Annotations data/coco/train2014/COCO_train2014_000000167118.jpg Traceback (most recent call last): File "download_and_convert_data.py", line 36, in
tf.app.run() File "/mnt/data1/daniel/tensorflow/_python_build/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "download_and_convert_data.py", line 30, in main download_and_convert_coco.run(FLAGS.dataset_dir) File "/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py", line 338, in run 'train2014') File "/mnt/data1/daniel/codes/FastMaskRCNN/libs/datasets/download_and_convert_coco.py", line 299, in _add_to_tfrecord img = img.astype(np.uint8) TypeError: long() argument must be a string or a number, not 'JpegImageFile'`
why this happened?
Can you try changing the line 292 in /libs/datasets/download_and_convert_coco.py
img = np.array(Image.open(img_name))
to
img = np.asarray(Image.open(img_name))
?
This might not be the solution, it is just a lucky guess
@kevinkit, unfortunately, guess does not help.
Thanks,@kevinkit, I’ve tried, but didn’t work. anybody help us?@CharlesShang
The error comes from the fact that if you open a file with
Image.open(...)
It creates an Image object which is from the PIL-library, which is called 'JpegImageFile' . The command
img.astype(np.uint8)
refers to a numpy array. So it happens because at the point this command is called img is not a numpy array but a JpegImagFile
I solved that problem.
@Designbook1, please, tell how?
@vaklyuenkov I just comment img.astype(np.uint8),
and that works.
Img.astype(np.uint8)
returns a copy of the array converted to the uint8 type, that will speed up the training time. If you comment that line, it works, but the trainning time will very slow,for example, with one GTX1080, it takes 3 hours to train 21k iterations, but it seems need to train 1500k iterations. which means it will take more than 7 DAYS to train the model!
So we need another way to solve it. I hope Anybody who solved that problem could tell us, thanks very much!
@kevinkit your answer is right, but how can I do to solve it? it means I need get the img's data?detail?
@Designbook1, thank you.
@Designbook1 I would love to tell you but I am not able to reproduce the problem. You can give the command np.array(Image.open(img_name))
a dataype so it should convert it automatically when opening the image with:
img = np.array(Image.open(img_name),dtype=np.uint8)
However I am not quite sure if this will speed up the training process. In the original MASK-RCNN paper they trained for 2 days on a 8 GPU system. Therefore 7 days seems like an "okay" time to train for this architecture, when only one GPU is available.
Furthermore, the datatype may get converted when going on the GPU to a suitable representation (int16 / int8 - if your GPU does support it)
regarding speed / training see: #31
@Designbook1 Sounds like the input JPEG image is corrupted (possibly just truncated). The imaging library can identify the file type as JPEG, but it can't read the data, so it can't convert the JpegImageFile into a numpy array.
Compare:
>>> im = Image.open("image.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x1086AFF50>
>>> np.array(im, dtype=np.uint8)
array([[[105, 100, 94],
[107, 102, 96],
[109, 104, 98],
...
versus
>>> im = Image.open("truncated.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x107073810>
>>> np.array(im, dtype=np.uint8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: long() argument must be a string or a number, not 'JpegImageFile'
May be easiest just re-downloading the images. If you're feeling particularly motivated you could use (say) djpeg to find which images are broken.
A nice feature request would be a checksum to check if the downloaded data is ok.
@kevinkit Yeah, I guess, but it's pretty easy to run md5sum on the downloads. I'd prefer the project to focus on the tricky groundbreaking stuff :-)
For what it's worth, I report the following MD5 sums for the big zip files:
5750999c8c964077e3c81581170be65b captions_train-val2014.zip
59582776b8dd745d649cd249ada5acf7 instances_train-val2014.zip
926b9df843c698817ee62e0e049e3753 person_keypoints_trainval2014.zip
0da8c0bd3d6becc4dcb32757491aca88 train2014.zip
a3d79f5ed8d289b7a7554ce06a5782b3 val2014.zip
@tweedmorris It seems your MD5 sums are the same with mine. So I think there is no problem with the files.
An idea for further investigation:
- Try to replace
Image.open
withcv2.imread
, to do so you must install opencv (pip install opencv-python)
If this fixes the error there is an issue with PIL alternatively you could also try it with scipy to load the images
-
If this does not help make a try except block around the code and check which images give you the exception, either if it is all images or just some or even just one image.
-
Update all libs etc.
@kevinkit Good idea about the cv2.imread – be aware though that OpenCV will load the image in BGR byte order rather than RGB, so you'll need to switch the byte order, and perhaps make a copy of the re-strided array just in case there are any downstream functions that are expecting contiguous data:
img = cv2.imread("image.jpg").astype(np.uint8)[:,:,::-1]
img = np.ascontiguousarray(img)
Note that although OpenCV does provide a separate image I/O library, the imread/imsave functions in scipy are just fairly thin wrappers around the PIL/Pillow library so you would expect the same error as before:
>>> from scipy.misc import imread
>>> imread("truncated.jpg")
array(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=637x800 at 0x11CB5FFD0>, dtype=object)
@Designbook1 Hope you've solved the problem, please do let us know.
Could it be that the EXIF data is corrupted? I had a similar issue with some images the other day. It turned out the EXIF data was corrupted. I was able to fix it by clearing out the EXIF data using piexif.
import piexif
piexif.remove("pathtofile")
@kevinkit @tweedmorris your answers have helped me to solve the problem. Thank you very much!
@Designbook1 I want to know how to solve the problem,thank you.
I also have this problem, and the file is different from @Designbook1 (it's around 70,000 and his is around 23851). I don't believe the downloaded file has problems, but during the unzip process my disk is full so I have to switch to another disk which might cause some of the images to be corrupted - so I just unzip the data again and it looks good now.
I had same problem too, but i tried deleted the data flie and properly unzip(one by one) and its work. Thank you, @xiaoyongzhu
Please tell me how to solve the problem,thank you very much. @Designbook1
@ty0803 u try to delete the unzipped files.
then unzip all files one by one again . It works for me.
@WKChung1028 OK!I try it.Thank you!
you can remove this image by follow code: img = Image.open('img_path') img1 = np.array(img).shape if len(img1) == 3: ....... else : continue this code maybe work.