Lee Hinman
Lee Hinman
Hi @shriphani, Sure, I will look at adding a way to throttle sending requests. In the meantime, you can workaround this by adding throttling (like a Thread/sleep or something equivalent)...
Itsy already looks at robots.txt, see: https://github.com/dakrone/itsy/blob/master/src/itsy/core.clj#L133
I think supporting the crawl-delay parameter would be the best, I'll add it to the todo.
Cool, I'll look into it, thanks for reporting it!
Hmm.. I'm unable to reproduce it: ``` user=> (with-open [rdr (clojure.java.io/reader "/tmp/bigfile")] #_=> (let [sentences (sentence-seq rdr get-sentences)] #_=> ;; process your lazy seq of sentences however you desire #_=>...
@turbopape I could see that being a pretty good representation, but I didn't want to include that out of the box since people using `load-string` is kind of dangerous. Might...
@turbopape certainly, I'm definitely down for adding more representations. I figure we can keep the map representation and have things that will output it in different formats depending on the...
Sure, the host limiter allows you to limit the URLs that Itsy fetches based on a hostname. By specifying the `:host-limit` option as `true`, Itsy limits the URLs corresponding to...
@jacopofar that would be great if it could be added, perhaps on metadata? (or wherever it fits)
> I think it'd work well to attribute the growth in doc size to that first pipeline name, since it's this pipeline which chooses what other pipelines to run later....