covid-data-pipeline
covid-data-pipeline copied to clipboard
Screen capture isn't capturing full page
Can we capture the full vertical height of the data pages? Seems we're cutting off the bottom of a lot of pages in our screen capture.
I'm pretty sure this is an issue for a crawler. @webmasterkai, Should this issue be assigned to another repo?
I'll look at this - the capture logic has been too simple for our use cases (I've been manually setting a capture browser window that works for most pages, and whitelisted some states that needed more but this isn't sustainable). Unfortunately I can't backfill this, but I can fix going forward.
@jdmaresco any other states off the top of your head that you've noticed this for?
I only looked at a couple – Colorado also had the issue: https://covidtracking.com/screenshots/CO/CO-20200324-140132.png
–jd maresco linkedin ( https://www.linkedin.com/in/jdmaresco/ ) | twitter ( https://twitter.com/jdmaresco )
On Tue, Mar 24, 2020 at 5:26 PM, Julia Kodysh < [email protected] > wrote:
I'll look at this - the capture logic has been too simple for our use cases (I've been manually setting a capture browser window that works for most pages, and whitelisted some states that needed more but this isn't sustainable). Unfortunately I can't backfill this, but I can fix going forward.
@ jdmaresco ( https://github.com/jdmaresco ) any other states off the top of your head that you've noticed this for?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/COVID19Tracking/covid-data-pipeline/issues/7#issuecomment-603513605 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAKEG3ZBEMMW5QLYWXETHZTRJEQRJANCNFSM4LS7Y5TQ ).
This is now fixed for (most) screenshots starting early morning 3/25. There are still some states with poorly-behaving websites which may get truncated, and I can fix manually as we see them.
@julia326 FWIW, CA's seem to be truncated currently. The pictures from https://covidtracking.com/data/state/california/ look like i.e. there's no visible numbers. Got worse over the last few days, presumably because there's more content at the top of the page.