autowebcompat
autowebcompat copied to clipboard
Implement a DOM-based technique as a baseline
We should compare our technique based on CNN with a technique that doesn't use machine learning.
We should collect screenshots with DOM information also. Presently we don't have them.
We have implemented collecting DOM information too, but we haven't collected any.
@marco-c do we simply need to run collect.py to start collecting data with dom info?
Yes, I think so. We implemented it recently. You should run it for a couple of websites and check that it is actually generating correct data.
@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ?
@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ?
Yes, because of all the time we have to wait to be sure we have loaded everything the crawler is quite slow.
You can add them normally, git lfs
is completely transparent.