ripme
ripme copied to clipboard
DeviantArt: Rip Literature
Follow up from #496 (original request: https://github.com/4pr0n/ripme/issues/496#issuecomment-299345268)
EDIT: Pending a largely complete rewrite after re-reading the source code and losing all the changes ...
-
The literature side makes quite extensive use of
div
tags withclass="..."
to mark things. -
On each page, the story text itself is stored inside
<div class="text">...</div>
and the text is escaped with HTML, and likewise the newlines are HTML. That tag doesn't appear anywhere else. However, I'm seeing JavaScript at the end of the story before the tag is ended (code indentation by Firefox-integrated website tools):
<script type="text/javascript">
if (!window.__meta_cache) {
window.__meta_cache = [];
}
window.__meta_cache['XXXXX']=[]
</script>
I replaced the actual content with XXXXX just in case. It doesn't matter anyway since that code ain't needed there.
- https://gist.github.com/rautamiekka/0b2e2aeb53a4f77fa20c5890c7b910b8
More info to come ...