single-file-cli
single-file-cli copied to clipboard
CLI doesn't capture Web Components
When using the CLI to save a web page that contains Web Components (Salesforce's Lightning Web Components in this case) it produces invalid HTML file.
The issue is tied specifically to the CLI, because extension produces a correct HTML file.
To Reproduce
- create new sandbox (takes 3 minutes to spin up a machine) and then click on Launch
- Clone SingleFile CLI repo, install packages and run:
npm exec single-file "https://example.com" -- --browser-wait-delay=20000 --browser-headless=false
- Copy the link from step 1) and navigate to it (you have 20 seconds to do it, the time can be configured in the command using
--browser-wait-delay
option). - Wait until SingleFile does the snapshot
- Result: webcomponents.html.zip
Extension result: extension-webcomponents.html.zip
Extension ✅ :
CLI ❌ :
Generally speaking, SingleFile CLI should support web components. For example, it can save https://bugs.chromium.org/p/chromium/issues/detail?id=1040752 which has hundreds of them almost properly. The only issue is related to the fact that a <table>
tag is missing.
Do I need an account on SalesForce to do your test? I cannot reproduce the procedure you described because I get a login page when pasting the URL on step 3.
Thanks for looking into that. I think this might be an issue specific to their web components then.
Do I need an account on SalesForce to do your test? I cannot reproduce the procedure you described because I get a login page when pasting the URL on step 3.
No, you don't need an account. Just click on the link from step 1) create new sandbox, wait a minute, and then click on "Launch" button:
I think I identified the cause of the issue. Actually the Aura components overwrite properties like innerHTML
. I noticed when debugging the code in the extension that their innerHTML
values are not empty, but they are empty when I inspect elements in the Dev Tools. Actually, the code of the extension is able to read the native value of innerHTML
because it has an access to a "protected" DOM (that cannot be overwritten by scripts on the page). The CLI tool (and the Dev Tools) does not have such a "protected" DOM and read the overwritten value instead of the native value of innerHTML
, i.e. an empty string.
Is it possible to instruct puppeteer to read a native value of innerHTML
? Or is extension more powerful in this case and there is no workaround for puppeteer?
Actually the correct term is "isolated world". Unfortunately, I confirm this feature does not exist in puppeteer today, see https://github.com/puppeteer/puppeteer/issues/2671. I guess a workaround could consist of running the browser in non-headless mode with SingleFile installed as extension, but that would require some work in order to communicate with SingleFile (or a fork of it).
Thanks @gildas-lormeau. I found Page.createIsolatedWorld in CDP. I wonder if I could use that with puppeteer to fix the issue. From what I understand I would need to create this isolated world for a page and each frame within it.
@tomaszferens Maybe, I did some tests but I was not able to make it work. If you want to do some tests easily in SingleFile CLI, you can apply the changes in the file https://github.com/gildas-lormeau/single-file-cli/blob/master/back-ends/puppeteer.js.
The version 2.x is now using isolated worlds.