Corentin Barreau

Results 31 issues of Corentin Barreau

Caused when CTRL+C a crawl, it was in finishing state then this happened. ``` panic: send on closed channel panic: send on closed channel goroutine 778 [running]: github.com/internetarchive/Zeno/internal/pkg/crawl.(*Crawl).Capture.func1(0xc0016dd2c0) /X/Zeno/internal/pkg/crawl/capture.go:233 +0x50...

bug
P1

Would be interesting to try to do OCR on images (as an option) to extract URLs from watermark and such.

enhancement

Use case being: running many Zeno on the same machine.

bug
good first issue

github.com/clbanning/mxj/v2 is being used for XML processing, I think it can be replaced by standard lib-only code.

enhancement

So the idea is basically to "replicate" the excellent Heritrix3 web UI. We want to give a way to start, stop, pause, unpause the crawl, but also inject seeds, search...

enhancement

I'm seeing a lot of DEBUG logs printed to stdout: ```time=2024-09-21T09:25:34.467+02:00 level=DEBUG msg="unable to extract URLs from JSON in script tag" error="invalid character 'l' after top-level value" url=https://old.reddit.com/r/PublicFreakout/comments/1fla2ks/another_video_of_israeli_soldiers_throwing/ time=2024-09-21T09:25:34.468+02:00 level=DEBUG...

bug
good first issue
P2

If you use get list with a seeds list that contain an empty line, Zeno won't start crawling.

bug
good first issue
P2

enhancement
good first issue
P4