Jim Smart

Results 24 comments of Jim Smart

Another option is to pass/return `io.Reader` across the interface, instead of `[]byte`

Yes. That's what I was thinking, similar to package `storage`. And have a `SetCache(c cache.Cache) error` method on `Collector`. I don't need `Close()` myself... `Get()`, `Put()` and `Remove()` should all...

> `Storage` is responsible for storing both visited urls and cookies, so a simple Get(), Put() isn't enough in this case. Of course not! :) I wasn't suggesting it would...

I have several hundred thousand web pages I am processing to extract data. Currently, if the task fails part way through, I just resume later, and rely on the cached...

Basically, if I'm going to invest the time and resources to download that much data, I'd most certainly like to keep it around, for later reruns.

Scrapy, the leading Python web scraper, has pluggable caches. Maybe take a look?

FWIW: some log output from collyzstandard tests showing compression stats... ``` 2018/02/12 02:40:51 compressed 12930/5251, ratio 2.46, in 2.757465ms https://google.com/ 2018/02/12 02:40:51 decompressed in 91.872µs https://google.com/ 2018/02/12 02:40:53 compressed 351734/71301,...

Sure, I'll get something together ASAP, I'm just on the tail-end of a bit of a coding marathon, so it won't be today. I don't think my code will integrate...

In my local repo I also have an implementation of 99% of the 'storage' API... for SQLite.

Um, maybe. But I want to minimise that, obviously. In an ideal world there'd be no breaking changes. That's partly why the code needs more work. I'm new to the...