autowebcompat icon indicating copy to clipboard operation
autowebcompat copied to clipboard

Collect screenshots at different times during the day or on different days

Open marco-c opened this issue 7 years ago • 5 comments

Many websites (e.g. with a carousel, or with news) change pretty often their content, but the overall structure remains the same. If we collect many screenshots for the same website over the day or over multiple days, we might be able to teach the network better to ignore differences in content and consider differences in structure.

See also https://groups.google.com/forum/#!topic/mozilla.compatibility/oU9eVcHSPng.

marco-c avatar Feb 08 '18 12:02 marco-c

Randomly, bugs can be scheduled using the Timer from threading.

from threading import Timer
t = Timer(SECONDS_IN_DAY * days_to_wait, lambda: run_test(...))
t.start()

This would also need a hash to be included with the date of screenshot. Maybe we can just use the day-month-year as the hash? That would avoid multiple screenshots in a single day. But when the crawler's waiting Timer thread executes after few days, it'll run the test for bug. This assumes a continually running crawler though.

Or randomly bugs can be added into a list file and added to scheduled_bugs folder as bugs_day-month-year.json Crawler will read the file for current day and use it on start and make a timer for 1 day. That timer will trigger the crawler to read the current day's file. This gives us goodness of both.

Which option is more preferred by you?

Trion129 avatar Feb 25 '18 17:02 Trion129

For now, we are running the crawler manually. So the only thing we should do is making the crawler store the date in the file name (and fix all the places where the name is used to account for the change).

N.B.: This has to wait for #88 to be merged.

marco-c avatar Feb 27 '18 17:02 marco-c

@marco-c As you suggested we should store time in the file_name. So new name will be something like bugid_seqno_H_width_V_height_date_7_08_2018_03_02_browser.png! If you feel it is right I will make the change. We can also keep time info in another file mapped to its name if you feel name becomes too long. Also if you have the original screenshots taken by the crawler (without modifications or copied with new birth times) we can gather their birth info also. I will write a script for the same if you have the original ones.

sagarvijaygupta avatar Aug 07 '18 09:08 sagarvijaygupta

If you feel it is right I will make the change.

Yes, looks good to me.

We can also keep time info in another file mapped to its name if you feel name becomes too long.

For now let's put it in the name. Then maybe in the future we will have a small json file alongside each screenshot with its metadata.

Also if you have the original screenshots taken by the crawler (without modifications or copied with new birth times) we can gather their birth info also. I will write a script for the same if you have the original ones.

For the original screenshots, we could simply use the current time, it doesn't matter as they already have different names.

marco-c avatar Aug 21 '18 15:08 marco-c

For the original screenshots, we could simply use the current time, it doesn't matter as they already have different names.

I wanted to say that if we have their birth time info we can use it as it will be a meaningful information. So afterwards we can have features like morning, evening, night which might be helpful afterwards. I will add the current time as of now. We can change it if required.

sagarvijaygupta avatar Aug 21 '18 15:08 sagarvijaygupta