autowebcompat icon indicating copy to clipboard operation
autowebcompat copied to clipboard

Label dataset

Open marco-c opened this issue 7 years ago • 21 comments

The labeling can be performed using the label.py script.

This script will show you a couple of images, and then you can press 'y' to label them as being compatible, 'd' to label them as being compatible with content differences (e.g. on news site, two screenshots could be compatible even though they are showing two different news, simply because the news shown depends on the time the screenshot was taken and not on the different browser), 'n' to label them as not being compatible, 'RETURN' to skip them (in case you are not sure yet), 'ESCAPE' to terminate the current labeling session and store the current results.

More details about the three-labeling system are present in the documentation at https://github.com/marco-c/autowebcompat#labeling.

marco-c avatar Jan 29 '18 20:01 marco-c

@marco-c A CNN learns more about the patterns in the image (Edges, Corners and their correlations) from example 2 it is evident that it will be difficult for a NN to learn the adversary and classify that both are compatible.

To detect differences, Y+D and N in a better way or even Y and D+N, I think we can focus more on, Finding ROIs (Attention based) and feed those patches to the NN. This can be our next go-to-go (alternative) if nothing works very well after training part which you suggested.

iamvc7 avatar Feb 07 '18 02:02 iamvc7

At the beginning I would start with screenshots based on equal page sources (same content), so only Y vs D+N. Furthermore I would try to normalise the device settings to bring the rendered Firefox version closer to the rendered Chrome version. And maybe we could remove the system look and feel elements by injecting a small script before the screenshot will be taken.

nok avatar Feb 12 '18 20:02 nok

@marco-c i'd like to label parts of our dataset, how do you suggest i go about doing that ? because as far as i've seen there is no script which merges labels from the label_persons directory into the actual labels directory .

Shashi456 avatar May 01 '18 03:05 Shashi456

@Shashi456 I think you are talking about generate_labels.py.

sagarvijaygupta avatar May 01 '18 09:05 sagarvijaygupta

@sagarvijaygupta oh , i thought it wasn't updated for the new files :P , but regardless should we not spend some time labeling the dataset we may need it this summer

Shashi456 avatar May 01 '18 10:05 Shashi456

@marco-c i'd like to label parts of our dataset, how do you suggest i go about doing that ? because as far as i've seen there is no script which merges labels from the label_persons directory into the actual labels directory .

The script hasn't been updated yet to deal with bounding boxes, but you can already start labeling and pushing your labels file to the repo. Then, once we have the script done, we will actually combine the labeling done by you and the labeling done by other persons.

marco-c avatar May 01 '18 21:05 marco-c

I am running label.py on my mac, and I am finding that it is slow or unresponsive on non-y images. For instance, it takes a long from when I try to drop a boundary box to when it shows up and for the 'T', resizing arrow, and movement arrow show up. Clicking on any causes everything to disappear until I release my mouse + a couple of seconds.

Is this a problem that anyone else has come up against?

sdv4 avatar Jun 12 '18 23:06 sdv4

It could be a Mac issue, I think nobody has tested it on a Mac yet. Could you try in a Linux VM?

marco-c avatar Jun 13 '18 04:06 marco-c

@marco-c I am not having that problem on the Linux VM, so I can label a lot faster now. A couple of questions:

  • Applying labels: suppose two images seem to only be different in terms of the position on the page that has been scrolled to (ex. Image 1 looks like image 2, except that image 2 has been scrolled down and thus exposes more of the page content). Would these be considered compatible, not compatible, or compatible but different.

  • Getting my labels into the main repo: Should I open a PR for a new branch off of my forked master that is the same as the upstream master, except that it includes my new labels?

sdv4 avatar Jun 27 '18 20:06 sdv4

Also, how would you label a pair of images when they show the same page except that one is in English and the other in Italian?

sdv4 avatar Jun 27 '18 20:06 sdv4

@sdv4 you can take help from the #220 till it is merged. Those screenshots are marked by @marco. For the last one you should mark them incompatible while drawing bounding box on Italian side.

sagarvijaygupta avatar Jun 28 '18 04:06 sagarvijaygupta

Getting my labels into the main repo: Should I open a PR for a new branch off of my forked master that is the same as the upstream master, except that it includes my new labels?

Yes! You can open a PR that says "Add some labels from Shane Sims".

marco-c avatar Jun 30 '18 09:06 marco-c

Are the other two questions answered by #220?

marco-c avatar Jun 30 '18 09:06 marco-c

@marco-c For the scroll one we have marked them as incompatible in screenshots, and for italian one we mark bounding boxes in italian side with incompatibility in #220 .

sagarvijaygupta avatar Jun 30 '18 09:06 sagarvijaygupta

For the scroll one we have marked them as incompatible in screenshots

IIRC I've marked them as compatible, didn't I?

marco-c avatar Jun 30 '18 09:06 marco-c

No maybe not, they should be incompatible (e.g. if clicking on a button causes a scroll in one browser, it should cause a scroll in the other browser too).

marco-c avatar Jun 30 '18 10:06 marco-c

https://github.com/marco-c/autowebcompat/blob/b18eae0999b6389b1cc84153f43b077eddccd9d8/collect.py#L162

And if this script works differently on two browsers then also it should be an incompatibility?

sagarvijaygupta avatar Jun 30 '18 10:06 sagarvijaygupta

And if this script works differently on two browsers then also it should be an incompatibility?

It shouldn't, but it's hard to tell whether it was this script that failed or something else. Maybe we should just assume this always works.

marco-c avatar Jun 30 '18 10:06 marco-c

Okay!

sagarvijaygupta avatar Jun 30 '18 10:06 sagarvijaygupta

@marco-c @sagarvijaygupta so while i was labeling the dataset one of the major themes that popped up was how chrome had a scrollbar. Almost all images which have a scrollbar are very similar but the scrollbars adds a shift which makes the overlay look incompatible .

Should we update the crawler options for chrome to remove the scroll bar or suggest the user something accordingly in the labeling guide?

Shashi456 avatar Jul 05 '18 19:07 Shashi456

@Shashi456 it is already removed from the crawler.

sagarvijaygupta avatar Jul 05 '18 19:07 sagarvijaygupta