oldnyc icon indicating copy to clipboard operation
oldnyc copied to clipboard

Missing OCR text for many images

Open danvk opened this issue 9 years ago • 3 comments

A few examples:

  • 702198b — text on brown backing paper
  • 706410b — brown backing with text
  • 709457b — grayscale with text but no OCR
  • 729236b — grayscale with text but no OCR
  • 711642b — Missing text from color image
  • 711564b — Missing text from color image
  • 716490b — Missing text from color image
  • 731966b — Missing text from gray image (why?)
  • 703429b — Missing text from color image

danvk avatar May 01 '15 14:05 danvk

Based on my survey, ~20% of images have text on the back that was not OCR'd.

danvk avatar May 01 '15 14:05 danvk

In that list, 8/9 were missing from the NYPL's S3 bucket. 731966b was actually the front of the image.

@riordan There were 30,413 back of the card images in the S3 bucket, but ~43,000 photos in the CSV file that Matt originally sent me. Is there any chance we could recover more of them?

danvk avatar May 01 '15 14:05 danvk

I'll try.

riordan avatar May 01 '15 17:05 riordan