autowebcompat icon indicating copy to clipboard operation
autowebcompat copied to clipboard

Investigate cases where the crawler wasn't able to take screenshots

Open marco-c opened this issue 7 years ago • 10 comments

There are a few cases where the crawler was not able to take screenshots. We should figure out why and try to fix any issue that we notice.

The files under data/ are in the format WEBCOMPAT-ID_ELEMENT-ID_BROWSER.png. WEBCOMPAT-ID are the IDs from webcompat.com. ELEMENT-ID are the element IDs where the crawler clicked before taking the screenshot. BROWSER is the name of the browser.

We should investigate these cases:

  1. XXXX_firefox.png is present but XXXX_chrome.png is not present.
  2. XXXX_ELEMENT_firefox.png is present but XXXX_ELEMENT_chrome.png is present.

marco-c avatar Jan 31 '18 12:01 marco-c

To do this, the first step would be to create a script to list all the inconsistencies.

marco-c avatar Jan 31 '18 12:01 marco-c

I am working on this. Just wanted to let people know so that we don't end up doing duplicate work.

skulltech avatar Jan 31 '18 17:01 skulltech

Which format would be best for exporting the inconsistencies?

  • A CSV with every row being an inconsistency
  • A JSON as a list of inconsistencies, the list consisting of dicts detailing it.

skulltech avatar Jan 31 '18 17:01 skulltech

Either CSV or a line-limited JSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON, that is a JSON object per line).

A normal JSON is a bit problematic because you can't easily see diffs between two versions of the file (e.g. if you just add one entry, the diff for a normal JSON file will show you the entire file).

marco-c avatar Jan 31 '18 18:01 marco-c

@marco-c what do you think we could do next regarding this ?

Shashi456 avatar Mar 28 '18 15:03 Shashi456

Manually look at the inconsistencies and see what prevented us from taking a screenshot. E.g. force the crawler to only load the website with an inconsistency and see if the crawler throws an exception in one of the browsers.

marco-c avatar Mar 29 '18 10:03 marco-c

can you point me in a direction as to how i could work with the crawler in this case and force it to load a site?

Shashi456 avatar Mar 29 '18 13:03 Shashi456

The crawler is in collect.py, you need to change it to load a URL you want instead of loading an URL from one of the webcompat bugs.

marco-c avatar Mar 31 '18 16:03 marco-c

@marco-c where do I get the URL's of the websites for which we have inconsistent screenshots, we haven't stored these website URL's anywhere

Shashi456 avatar Apr 11 '18 07:04 Shashi456

We have stored the webcompat ID, so you can retrieve the URLs either with Python by using utils.get_bugs() and finding the bug you want, or by loading the bug on the webcompat.com website (e.g. https://webcompat.com/issues/1491).

marco-c avatar Apr 11 '18 09:04 marco-c