Jimmy Lin
Jimmy Lin
Okay, this is pretty nuts. CDH jars depend on an older version of guava, so upgrading to guava 18 breaks Pig test cases. Srsly.
Thanks for your positive feedback. I think it'll work if you do this: ``` val r = RecordLoader.loadArchives("src/test/resources/arc/example.arc.gz", sc) .keepValidPages() .map(r => ExtractDomain(r.getUrl)) .take(1) ``` I.e., extract the fields that...
This is related to the notion of "warrants" in "The Craft of Research".
https://daemon.github.io/log.d/write/explore/metaresearch.html
Thanks for noting. Now that the semester is winding down, I'll have time to take a look at this. I PR would be even better...
No, please don't do that. I'll look into pushing artifacts onto Maven central. In the meantime, you can always publish Maven artifacts locally with `mvn install`.
I see - well, if it's a blocker then by all means publish your own version to Maven central. I'm a bit swamped these days (as well as @ianmilligan1) and...
Hi @dportabella can you please reach out to @ianmilligan1 and myself over email? Let's move this discussion on a separate channel...
Current fix is to catch exception and move on. https://github.com/lintool/warcbase/commit/a00e413edff46d4655fe621b65d1af89ffda33c4 Might be worth looking in detail a bit more on what's going on at a later point in time.
This is not a use case we've considered thus far. Wouldn't be too hard to implement - `loadArchive` ultimately calls a Hadoop `InputFormat` to read ARCs and WARCs. We would...