Nick Ruest

Results 44 comments of Nick Ruest

`-rateLimit=90000` is working for me.

@spekulatius can you rebase this, and provide some rationale for moving things around to deprecated so it's on record?

@anjackson hopefully I'll get back to reviewing the documentation soon. I shared [this work-in-progress](https://www.dropbox.com/s/6qvg8qqwfxsdivf/heritrix-documentation-notes.txt) on a oh-sos call a couple weeks ago. I think you were on holiday.

> The major 'front pages' should be reviewed to check they make basic sense and link to the right places. Let me see if I can come up with a...

...which in-turn uses https://github.com/edsu/unshrtn We could incorporate that in. Or, create a method in warcbase that does the same thing, or maybe there is already a Java library that does...

@lintool can you clarify what you mean by "a file that has the mapping from short urls to the full URLs"?

...or, is this what you're looking for? https://github.com/edsu/unshrtn/blob/master/unshrtn.coffee

Oh, https://github.com/edsu/twarc/blob/master/utils/unshorten.py#L37-L53 puts it back in the dataset with a new entry.

You don't do it on the preservation/master version of the dataset, you always `cat` it out to a new file. By default it is `stdout`. It only reads the preservation/master...

Would the output be: ``` short, count, long, count http://t.co/pbFMYFZpQC, 12, http://foo.bar.com/, 123 ```