autowebcompat icon indicating copy to clipboard operation
autowebcompat copied to clipboard

Implement a DOM-based technique as a baseline

Open marco-c opened this issue 6 years ago • 6 comments

We should compare our technique based on CNN with a technique that doesn't use machine learning.

marco-c avatar Jun 09 '18 18:06 marco-c

We should collect screenshots with DOM information also. Presently we don't have them.

sagarvijaygupta avatar Jun 09 '18 18:06 sagarvijaygupta

We have implemented collecting DOM information too, but we haven't collected any.

marco-c avatar Jun 09 '18 18:06 marco-c

@marco-c do we simply need to run collect.py to start collecting data with dom info?

Shashi456 avatar Jun 12 '18 11:06 Shashi456

Yes, I think so. We implemented it recently. You should run it for a couple of websites and check that it is actually generating correct data.

marco-c avatar Jun 13 '18 04:06 marco-c

@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ?

Shashi456 avatar Jun 22 '18 17:06 Shashi456

@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ?

Yes, because of all the time we have to wait to be sure we have loaded everything the crawler is quite slow. You can add them normally, git lfs is completely transparent.

marco-c avatar Jun 22 '18 23:06 marco-c