Andy Jackson
Andy Jackson
Looking deeper inside, [we find](https://github.com/iipc/openwayback/blob/6d64391226db88fbf4fc7ef44a9b2bbfbb3166da/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/HTTPRecordAnnotater.java#L137): ``` java // Now the sticky part: If it looks like an HTML document, look for // robot meta tags: if(isHTML(mimeType)) { String fileContext =...
@ikreymer's [openwayback-sample-overlay](https://github.com/iipc/openwayback-sample-overlay) goes some way to addressing this, I think.
Java has a system for extensible URL protocol support, we could consider using it if necessary, or checking if these hooks have not already been written by others. - http://stackoverflow.com/questions/26363573/registering-and-using-a-custom-java-net-url-protocol...
IIRC from the Paris meeting, the issue was about how to manage and roll through large indexes with daily changes. But perhaps we should close this is no-one is clamouring...
Note that any refactoring should be done on a fork first. Apart from anything else, SCAPE deliverables may be referencing individual resources in this data set, and so changing the...
Note feedback thus far: - https://twitter.com/beet_keeper/status/509843753901629440 - https://twitter.com/bitsgalore/status/510004805335797760
Yeah, I mean, it's log4j 1 not 2, but that's not great either. Perhaps the whole tools section should just be deleted? Is anyone using any of it? I'm unlikely...
I note these links appear to redirect to http://www2.girona.cat/ca (i.e. with a www2. instead of a www.) -- is it possible that didn't fall into the scope of the crawl?
Related to #29 in terms of UI-level integration? Or would this integration happen within OpenWayback?
Related to #29?