Johannes Holzfuß
Johannes Holzfuß
I actually started on this a while ago, but then thought it would be silly for a single person to attempt this and stopped, but now that I see this...
> @DataWraith Just had a look at the gozim demo, looks really cool. In the short-term, this does seem like the best option (apologies for my terse reply earlier @rht...
Short progress update: I'm now feeding files to `ipfs add` in batches of 25, that seems to have solved the memory issue for now. I hope that feeding in the...
I had no luck getting `ipfs add` to ingest the HTML files; pre-adding the files in batches didn't do anything. `ipfs` (without the daemon running) consumed enough RAM to fill...
Hi. I've decided to delete the trial-files obtained using `wget` and go all out and try to actually dump the entire most-recent English Wikipedia snapshot (with images) with my program....
Heh. Eventually I'd like to write a program that converts a MediaWiki dump to HTML (probably by running it through pandoc), but right now I'm fairly busy, sorry. I could...
I took another look at this, and wanted to share what I found, in case it is useful to the next person. Extracting the article markup from the XML dump...
*sigh* This is much harder than it looked in the beginning. I realize I'm flip-flopping on this a lot -- should've kept my mouth shut from the beginning. Anyway. This...
Parsoid is intended to be able to convert from MediaWiki markup to HTML and back in a lossless fashion (they do 'round trip testing'). I haven't noticed any mistakes with...
Any update on this? I seem to be getting this behavior both in powershell and cmd.exe. I'm using [pb](http://github.com/cheggaaa/pb), and the progress bars get longer and longer with every invocation.