athenapdf icon indicating copy to clipboard operation
athenapdf copied to clipboard

Is Delay the only way of deciding a page is finished?

Open ebiggs opened this issue 7 years ago • 4 comments

Thanks for the amazing project, guys. I switched from wkhtmltopdf to athenapdf when I encountered some vexing rendering issues in wkhtmltopdf that weren't present in athenapdf. However I discovered two things about athenapdf that I hope can be improved on.

The first one is the delay setting. I noticed when I used the default delay of 200ms, about 80% of my batch would produce prefect pdfs while the remaining 20% were converted before the page was ready. I doubled the delay and still a handful of outliers did not produce a complete page, so to solve this I basically had to configure my delay to be as slow as the slowest outlier, which is rather unfortunate. wkhtmltopdf on the other hand seemed to have some insight into the browser's internals and just "knew" when the page was ready. I believe they even have an option where you can control the readiness using some js.

My question is if delay is the only approach athenapdf is going to take on this matter in the foreseeable future and if there's just some limitations in chromium or the project that prevents this wkhtmltopdf functionality I miss from making its way into this project?


The second issue is a lesser concern, and I suspect something that is an upstream issue with chromium itself:

When I set my font-face using a .woff, it produces text in the pdfs that do not look great in adobe software by default. Adobe has a feature called "enhance thin lines" that makes the font when included as a .woff look terrible until that feature is turned off. The only lead I have on this is that adobe claims the "enhanced thin lines" feature is not supposed to happen on fonts.. It appears as if the glyphs are being included in the pdf in such a way that this promise from adobe is not fulfilled.

So i unbundled the .woff and included the same font as a .ttf instead and the "enhanced thin lines" issue goes away, but instead there is a subtle kerning issue that wasn't present when the font was a .woff. Basically, everything looks mostly good, except occasionally a character will overlap another or there will appear to be half a space between two characters that shouldn't be there.

If this is not an upstream project issue I can produce some test materials so that these issues can be easily reproduced.

That being said, a fantastic project, really really appreciate it!

ebiggs avatar Jul 17 '17 23:07 ebiggs

Hi @ebiggs,

Thank you for your kind words. It means a lot.

The current version of Athena relies on a Electron event, did-finish-load (similar to onload), before rendering to PDF. The delay simply waits for X amount of time after this event. Consequently, the page may finish loading, but there are cases where the page may not actually be ready (e.g. when there are lots of external resources like JavaScript).

Version 3 works in a similar way. We wait for the page to be ready before rendering. But, you can also control the readiness using JavaScript plugins. See https://github.com/arachnys/athenapdf/commit/9eef4c5e3421cacfae09ad4fa9080f8ccb9d6a59, and https://github.com/arachnys/athenapdf/commit/0d02297004a242cb1e8bb68f3271754d2034865a for examples.

I believe another approach to know when a page is ready is to check when frame changes are stabilised. That is, trigger the rendering when the rate at which frame changes on the page drops below a specified value. But, that will not be added any time soon as there may be potential complexities that are not fully understood / explored (e.g. a page that is frequently updated due to some CSS or JavaScript animation).

As for your second issue, I will have to investigate that. Some test materials will be greatly appreciated. Importantly, I would like to know if I can replicate this on the newest version since we have no plans to continue with the current version of Athena.

Thanks, and let me know if that answers your questions or if you have any additional ones.

MrSaints avatar Jul 18 '17 12:07 MrSaints

@MrSaints really interesting to think about the frame update frequency. Maybe in that scenario the delay could just be an upper bound of the amount of time waited for frame rate to fall below a certain threshold?

dbuxton avatar Jul 18 '17 14:07 dbuxton

@dbuxton Yes, perhaps. Without testing, I am unsure how that will behave when updates are sporadic with large, and small bursts. In such a case, maybe it should consider the upper bound after the rate is fairly constant.

MrSaints avatar Jul 18 '17 20:07 MrSaints

Another heuristic that can be used to determine if the page is 'ready' is requests made, as in dynamic content loaded after onload. If the requests stop we have some additional degree of confidence.

Shou avatar Jun 03 '18 13:06 Shou