Pytorch-NLU
                                
                                
                                
                                    Pytorch-NLU copied to clipboard
                            
                            
                            
                        HTML/DOM Collection
Talking to @gioragutt regarding HTML collection and writing things down:
- add a per-step hook (like screenshots/console-logs) that calls 
Page.captureSnapshotas mhtml and save it (don't use fast-mhtml yet). Make sure to add it to the config file and make sure it's off by default. - Open a PR.
 - On top of it - add 
fast-mhtmland add a step (in the viewer) that parses the mhtml using it to display the output ofPage.captureSnapshot. 
Takeaways:
- Probably better to use 
Page.captureSnapshotand notDOMSnapshotgiven the standard format and the lack of info on parsing the output of DOMSnapshot. - Probably better to save the MHTML as a separate file like screenshots rather than inline it in the results file.
 
Impl related question - CDPSession, can I acquire it early (as early as attach) and keep reference to it in the hook? Or should I get it every time I invoke the hook (probably not)?
In the util.ts code you showed me, I didn't see you closing the session, and I don't get the feeling from CDPSession['attach'] that you have to call to free up resources or whatever.
You can probably access the existing CDP session from the page - though caching (a WeakMap of Page is fine)
This is merged, but we leave that issue open until we have docs in place