Incomplete output on broken HTML like https://distrowatch.com/table.php?distribution=void
Nice tool.. but there's a lot of broken HTML on the web I guess :sweat_smile:
Tried monolith 'https://distrowatch.com/table.php?distribution=void' > distrowatch.com--void-linux.htm but output misses quite a lot compared to curling the htm file..
Maybe this is part of the problem: https://validator.w3.org/nu/?showoutline=yes&doc=https%3A%2F%2Fdistrowatch.com%2Ftable.php%3Fdistribution%3Dvoid
Hello Marcel,
I was able to get an almost 1:1 exact same page as on the web with that command. Could you please point out what seems to be missing in the saved file?
It was missing the huge table following an H2 tag..
Oh wow, what the heck.. now it works here too :thinking:
Unfortunately I had overwritten the file I previously got to diff with curl.. that was created with
monolith --no-css --no-images --no-js 'https://distrowatch.com/table.php?distribution=void':
distrowatch.com--void-linux.txt
Mh actually that is still missing it.. gotta run now got train to catch
Interesting. Could you please try saving it again and wait for another train? It doesn't look like I'm able to reproduce it on my end.
Looks like it needs to either have JS or CSS to render those tables, or alternatively you can provide this flag: -n. It'll unwrap NOSCRIPT tags and make it look the way things look in browsers that don't have JS enabled.
Ok coming back to this, a more interesting observation is that output fluctuates... Try this command repeatedly:
monolith 'https://distrowatch.com/table.php?distribution=void' > distrowatch.com--void-linux.$(date +%F.%H%Mh%S).htm
The -n option actually makes no difference with that..