Pytorch-NLU icon indicating copy to clipboard operation
Pytorch-NLU copied to clipboard

HTML/DOM Collection

Open benjamingr opened this issue 5 years ago • 3 comments

Talking to @gioragutt regarding HTML collection and writing things down:

  • add a per-step hook (like screenshots/console-logs) that calls Page.captureSnapshot as mhtml and save it (don't use fast-mhtml yet). Make sure to add it to the config file and make sure it's off by default.
  • Open a PR.
  • On top of it - add fast-mhtml and add a step (in the viewer) that parses the mhtml using it to display the output of Page.captureSnapshot.

Takeaways:

  • Probably better to use Page.captureSnapshot and not DOMSnapshot given the standard format and the lack of info on parsing the output of DOMSnapshot.
  • Probably better to save the MHTML as a separate file like screenshots rather than inline it in the results file.

benjamingr avatar Oct 04 '20 16:10 benjamingr

Impl related question - CDPSession, can I acquire it early (as early as attach) and keep reference to it in the hook? Or should I get it every time I invoke the hook (probably not)?

In the util.ts code you showed me, I didn't see you closing the session, and I don't get the feeling from CDPSession['attach'] that you have to call to free up resources or whatever.

gioragutt avatar Oct 04 '20 16:10 gioragutt

You can probably access the existing CDP session from the page - though caching (a WeakMap of Page is fine)

benjamingr avatar Oct 04 '20 16:10 benjamingr

This is merged, but we leave that issue open until we have docs in place

Bnaya avatar Oct 13 '20 12:10 Bnaya