dom-distiller icon indicating copy to clipboard operation
dom-distiller copied to clipboard

Failed on some folded page

Open yangxiaomin08 opened this issue 8 years ago • 15 comments

Hi,

As we known, there are lot of pages/articles are folded with some button like 'show more/show more' to show all. After clicked the button, the hidden content was shown. But in some website, the hidden content might be different in dom, such as different level as previous marked 'content', in this case, the hidden content cannot be recognized as 'content'.

Let's take https://m.sohu.com/n/477121843/?wscrid=1137_4 as example, after clicked 'show more' button, and distill it manually, the original hidden content is not distilled.

yangxiaomin08 avatar Dec 28 '16 03:12 yangxiaomin08

After analyze the source, my guess the failure is caused by the hidden content is wrapped in a div with a id "rest_content" tag, while the normal content which has been marked as content starts with "p " tag. The SimilarSiblingContentExpansion failed to recognize the hidden content ("rest_content") as content.

This is my guess only, I haven't found an easy way to debug and verify it yet. If you guys have any idea about how to debug it, it would be appreciated for sharing.

Furthermore, even if my guess is correct, I don't have any idea about how to 'fix' it to avoid mistakenly mark other non-content as content. Any advice is welcome.

yangxiaomin08 avatar Dec 28 '16 08:12 yangxiaomin08

Hi,

Would you mind filing a bug in crbug.com following our README? We usually track bugs and feature requests there. Issue tracker here on github can work as a more free-form Q and A.

Thanks, Wei-Yin

wychen avatar Dec 28 '16 09:12 wychen

Filled a bug in https://bugs.chromium.org/p/chromium/issues/detail?id=677359&q=component%3AUI%3EBrowser%3EReaderMode

yangxiaomin08 avatar Dec 29 '16 01:12 yangxiaomin08

Hi Wei-Yin,

From previous discussion, I have reverted some code and successfully debug the java code. I still have some questions, would you please give me a hand? thank you.

Is there any example/test case aims to load a url and distill the page and then display the content in chrome? As I know, I can do this in chrome(such as from chrome://dom-distiller or from menu if I start chrome with --enable-dom-distiller switch) , but I'm not sure whether I can debug the javacode in that way. Another way to say is that, is the java debugging feature can only be used in the local war/xxxx directory).

yangxiaomin08 avatar Dec 29 '16 07:12 yangxiaomin08

It's doable.

After reverting to the state where source map works, do the following:

  • Modify build.xml, and change gwt.args to be the same as gwt.test.args, and add --sourcemaps option in the extractjs target, like extractjs.jstests.
  • Edit java/DomDistiller.gwt.xml and add the source map option, like in javatests/DomDistillerJsTest.gwt.xml.
  • Run "ant package"
  • Edit the last line of out/extension/domdistiller.js to be "//@ sourceMappingURL=../debug/....". It's to add "../".
  • Use the Chrome extension to distill a page. Note that you need to use the button on the toolbar, not the "Profile Extraction" button on the Dom Distiller page.
  • In the Profiles tab, click any functions, and you should be back to the Java code.

Let me know if this works for you.

wychen avatar Dec 29 '16 08:12 wychen

Hi Wei-Yin,

Thanks. I have modified the code step by step.

I couldn't find the instruction about how to test in chrome extension. Such as how to install the modified package to extension. Would you please give me some tips? thanks again.

yangxiaomin08 avatar Dec 30 '16 07:12 yangxiaomin08

Have you read this yet? https://github.com/chromium/dom-distiller#developer-extension

wychen avatar Dec 30 '16 07:12 wychen

Thank, I missed it.

I have tried and there is profile extension. The problem I met was when I clicked the profile button, the page became empty/white page. I haven't modified any java code yet, what I have done is follow your instructions about how to enable source map and the above instructions.

My chrome version is 55.0.2883.87 (64-bit) on ubuntu.

yangxiaomin08 avatar Dec 30 '16 08:12 yangxiaomin08

I noticed there is a warning in the latest version of chrome, I need to modify //@ to //# //# sourceMappingURL=../debug/domdistiller/src/domdistiller.sourcemap

But still got the white page.

message in console. MarkupParser.java:147 DomDistiller debug level: 0 /home/yangxm/codes/dom-distiller/out/extension/extract.js:9 Object1: ""2: Object3: Object5: Object6: Object7: Object8: Object9: "auto"10: Array[0]proto: Object /home/yangxm/codes/dom-distiller/out/extension/preview.js:2 Uncaught TypeError: Cannot set property 'innerHTML' of null at /home/yangxm/codes/dom-distiller/out/extension/preview.js:2

yangxiaomin08 avatar Dec 30 '16 08:12 yangxiaomin08

I have also tried the extension mode in my chrome built by myself, it is about based on m53 version. Still the the same console output.

Not sure which step went wrong, my understand is that your last instructions are only about enabled source map feature in chrome extension.(Still have the local modification to revert https://bugs.chromium.org/p/chromium/issues/detail?id=617360). I have reverted and tried again, still got the same console output and white page.

yangxiaomin08 avatar Dec 30 '16 09:12 yangxiaomin08

More information, I follow the guide about "Run in Chrome for desktop".

  1. copy to chrome/src/...
  2. touch dom_distiller_resources.grdp
  3. build chrome
  4. load page and distill it from menu.

The content can be viewed.

Is this caused by the extension doesn't decode dom distiller return value which is protocol buffer format?

yangxiaomin08 avatar Dec 30 '16 09:12 yangxiaomin08

Some more background. After installing the extension, there should be one additional icon on the toolbar. In the devtools, there should be one additional tab, named "Dom Distiller", where there is one button, named "Profile Extraction". If you click "Profile Extraction", the current page would be distilled, and the JS Console should contain some debug info. If you click the icon, distillation would be done as above, and in addition, the distilled content would replace the original page.

From the console output, it looks like the extracted content is empty (or at least the title is empty string). Strangely, at preview.js line 2, it seems document.body is null in your case. When you click the icon, does the tab contain an article?

wychen avatar Dec 30 '16 09:12 wychen

I did see the "Dom Distiller Dev 1.0" after load the unpack extension and a icon named "Profile Extraction".

I clicked the "Profile Extraction" button, and empty page. The page has an article. It is https://m.sohu.com/n/477367845/?wscrid=95360_1.

yangxiaomin08 avatar Dec 30 '16 09:12 yangxiaomin08

If I didn't locally revert the code https://bugs.chromium.org/p/chromium/issues/detail?id=617360(Just keep the clean version of git repo), it works in extension mode.

Looks like the extension mode is broken by that 'local revert' to support the source map. Would you please help to verify it? thanks.

yangxiaomin08 avatar Dec 30 '16 09:12 yangxiaomin08

If I were you, I'd make sure the reversion is correct first, since it's not a clean revert. Then work on enabling source map in the extension.

wychen avatar Jan 03 '17 07:01 wychen