pqai icon indicating copy to clipboard operation
pqai copied to clipboard

Correlating page no. and drawing no. when retrieving drawings from the API

Open mahesh-maan opened this issue 1 year ago • 0 comments

The issue

When you fetch a drawing from the API (e.g., /patents/US7654321B2/drawings/4) the API return sheet no. 4 (the 4th page containing drawings) and not the sheet that actually contains drawing no. 4.

The cause

This happens because pages often have multiple drawings, so there is no 1-1 mapping between the pages and the drawings.

How it is supposed to work?

Ideally, we'd want the sheet containing drawing no. 4 (or perhaps a cropped portion only containing drawing no. 4 in the correct orientation) to be returned from the route /patents/<pn>/drawings/4

Possible solutions

  1. It is to be checked whether this data (mapping between pages and drawings) is generated by the USPTO and made available (e.g., via an API or somewhere in the data dumps).
  2. OCR-ing the pages may yield this data - although doing so is likely to involve some kind customization or preprocessing of the page's image. Depending on how much compute it needs, it may or may not be possible to precompute this for all existing patents. In case the processing can be done in a couple of seconds, it may be done on the fly just before returning the drawing.

Note: if you know of a solution that should be considered, please do mention in the comments.

mahesh-maan avatar Jan 24 '24 06:01 mahesh-maan