Christopher Foo
                                            Christopher Foo
                                        
                                    Implementation note: `pipeline.py` (15ae3ca6a6831f2b1ae366a58d5620474f5b3d2c) already adds this cookie for the top level URL.
An umbrella issue for the actual crawling implementation is at chfoo/wpull#74. This issue should cover adding a future Wpull option to the arguments.
I guess looking at it now, it should be ordering by `oai_updatedate`. Keeping track of the last successful retrieval date is a good idea. Edit: I tried running it on...
So it seems like there isn't any noticeable problems removing the extra `parseFloat` and the LoadDAE example works. However, I discovered numerical issues with COLLADA files exported from Blender using...
Thank you, I tested it on Neko, C++, and HTML5 and I can confirm it works now.
Also to add that currently Warcat uses Python's built in HTTP library which does not handle edge cases that web browsers do.
Good catch, it's not supposed to be missing that one. (As a FYI, this project was written based on the draft WARC 1.0 spec. I haven't updated the project since...
Sure, I think that sounds great!
Thanks, I'll be happy to accept pull requests. Please take your time to finish porting the code. Correctness is much more important than version compatibility.
See also: - https://bitbucket.org/hanzo/warc-tools/issue/11 - https://bitbucket.org/rajbot/warc-tools/issue/1 - https://github.com/internetarchive/CDX-Writer/issues/3