Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

State-of-the-art web crawler 🔱

Results 82 Zeno issues
Sort by recently updated
recently updated
newest added

Closes #61 @CorentinB Please note, I'm not proficient in Go, so feedback is welcomed and edit at will.

I built Zeno from source ([687b5d5](https://github.com/internetarchive/Zeno/commit/687b5d5982be433206b03022d0a03dc0a1227501)) and ran `Zeno get url` only be told I did not have enough space. It would be great if (1) this value was customizable...

``` panic: open jobs/warcs/SPNOUTLINKS-20221021045127671-00030-crawl900.us.archive.org.warc.gz.open: no such file or directory goroutine 149 [running]: github.com/CorentinB/warc.isFileSizeExceeded({0xc166f684e0?, 0xc0001b4520?}, 0x408f400000000000) /var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/utils.go:196 +0x10e github.com/CorentinB/warc.recordWriter(0xc00057e0f0, 0x0?, 0x0?) /var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/warc.go:120 +0x499 created by github.com/CorentinB/warc.(*RotatorSettings).NewWARCRotator /var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/warc.go:50 +0x75 ```

This is quite a half-baked idea, but we'd be looking to implement some sort of hit counter for items in the hash table, allowing us to clean it up when...

enhancement

enhancement
internal-only

Allow operators to define headers in a yml file per domain to allow for greater control over headers like User-Agent or similar headers that may need to be configurated per...

enhancement

``` 2024/08/15 07:49:36 http: panic serving 127.0.0.1:45290: runtime error: invalid memory address or nil pointer dereference goroutine 212832422 [running]: net/http.(*conn).serve.func1() /var/www/.go/src/net/http/server.go:1903 +0xbe panic({0x1371ac0?, 0x21a7f70?}) ```

bug
P1