monolith icon indicating copy to clipboard operation
monolith copied to clipboard

Incomplete output on broken HTML like https://distrowatch.com/table.php?distribution=void

Open eMPee584 opened this issue 1 year ago • 5 comments

Nice tool.. but there's a lot of broken HTML on the web I guess :sweat_smile: Tried monolith 'https://distrowatch.com/table.php?distribution=void' > distrowatch.com--void-linux.htm but output misses quite a lot compared to curling the htm file.. Maybe this is part of the problem: https://validator.w3.org/nu/?showoutline=yes&doc=https%3A%2F%2Fdistrowatch.com%2Ftable.php%3Fdistribution%3Dvoid

eMPee584 avatar Mar 29 '24 12:03 eMPee584

Hello Marcel,

I was able to get an almost 1:1 exact same page as on the web with that command. Could you please point out what seems to be missing in the saved file?

snshn avatar Mar 29 '24 16:03 snshn

It was missing the huge table following an H2 tag.. Oh wow, what the heck.. now it works here too :thinking: Unfortunately I had overwritten the file I previously got to diff with curl.. that was created with monolith --no-css --no-images --no-js 'https://distrowatch.com/table.php?distribution=void': distrowatch.com--void-linux.txt Mh actually that is still missing it.. gotta run now got train to catch

eMPee584 avatar Mar 29 '24 16:03 eMPee584

Interesting. Could you please try saving it again and wait for another train? It doesn't look like I'm able to reproduce it on my end.

snshn avatar Mar 29 '24 16:03 snshn

Looks like it needs to either have JS or CSS to render those tables, or alternatively you can provide this flag: -n. It'll unwrap NOSCRIPT tags and make it look the way things look in browsers that don't have JS enabled.

snshn avatar Mar 29 '24 16:03 snshn

Ok coming back to this, a more interesting observation is that output fluctuates... Try this command repeatedly:

monolith 'https://distrowatch.com/table.php?distribution=void' > distrowatch.com--void-linux.$(date +%F.%H%Mh%S).htm

The -n option actually makes no difference with that..

eMPee584 avatar Apr 28 '24 10:04 eMPee584