ambuda icon indicating copy to clipboard operation
ambuda copied to clipboard

Use a pdf viewer instead of OSD for proofing page?

Open epicfaace opened this issue 3 years ago • 5 comments
trafficstars

I've found that it's much easier for me to just open up the original PDF in a new window, instead of OSD, when doing proofing. This is because I can scroll to pan (instead of zooming), and I can also easily scroll to the next page (rather than having to click on a small button in the Ambuda OSD view, which also only lets me view one page at a time). Finally, OSD also some level of jitteriness (when I pan, everything seems to have some inertia and pans further even after I let go of the mouse pointer) which PDF viewers do not have.

I would propose that we use something like pdf.js to view the proofing page instead. Ideally, we could also show the entire PDF at a time, and the left / right arrow navigation buttons would only be to change which page's textbox is shown.

epicfaace avatar Aug 27 '22 17:08 epicfaace

OSD has a lot of config knobs, so I'm confident we can implement scroll to pan and remove the pan inertia. Viewing multiple pages with OSD seems difficult, and pdf.js would be a better solution out of the box here.

Some of the considerations I have in mind are:

  • setup complexity. It looks like pdf.js is easy to set up.
  • experience for low-bandwidth users. On similar proofing projects, I've heard complaints about the time required to download a PDF then upload it to start a new project. Serving an entire PDF would not be feasible for these users.
  • Extensibility. OSD works with a plain image that we could modify or transform with JS, such as to draw bounding boxes. From what I've found, PDF.js does not support this use case. From poking around, it's also not obvious how to scroll to a specific vertical offset as a percentage of the page height, e.g., "scroll 45% into page 35"

akprasad avatar Aug 27 '22 20:08 akprasad

I'm confident we can implement scroll to pan and remove the pan inertia.

Unfortunately I don't think that would be easy, see https://github.com/ambuda-org/ambuda/issues/158

I do think that if we do use another PDF viewer, we should not serve the original PDF file. Might make the most sense to just request images of the current page, construct a PDF on the client-side using jspdf, then request additional pages on demand?

epicfaace avatar Aug 27 '22 20:08 epicfaace

We only need an image viewer not a PDF viewer, right? The bulk of the work of a PDF viewer is dealing with the PDF internal data structures and instructions and rendering them into user pixels, which has already been done in the backend (we've already rendered the PDF into images). So there does not seem to be any benefit to involving the PDF format once again on the frontend, AFAICT.

shreevatsa avatar Aug 31 '22 13:08 shreevatsa

PDF viewers have a smoother experience with scrolling, and make it easier to view multiple pages at a time. It would be very smooth if one could just scroll to the next page, and then the text box automatically updates to have the transcription for that page.

epicfaace avatar Aug 31 '22 19:08 epicfaace

Yeah I guess it's possible that, because PDF viewers (PDF.js is the only one, AFAIK?) are optimized for viewing "pages", while image viewers are optimized for viewing "images", in practice it may turn out that PDF viewers have better affordances / design for viewing pages… it's just that it feels wrong from a technical perspective; it would be somehow very sad if that were the case. :-) (If the only way to get a good image viewer for pages is to wrap the images on the frontend into a PDF!)

BTW though I don't know how common it is for someone using the proofreading page to want to quickly scroll through multiple pages, I did that just now (to get a rough idea of what's in a book) and apart from the inconveniences mentioned in the original report above (have to click on the small arrows next to "History" etc), it was also a bit slow (not unbearably so, just enough to be perceptible) — and it seems that most of the time is taken by the image loads themselves:

image

So either hosting the images on a faster CDN(?) or some sort of pre-fetching and caching of "nearby" images may be worth doing, if viewing multiple pages is a frequent use-case.

Also I have plans to implement a line-by-line view for the proofreading based on the OCR results, and I too like the idea of "scroll to a certain page and see the corresponding text next to it" (once I made some webpages like that :-)).

shreevatsa avatar Sep 02 '22 21:09 shreevatsa