Zeno doesn't start with get list and an empty line in list
If you use get list with a seeds list that contain an empty line, Zeno won't start crawling.
https://github.com/internetarchive/Zeno/blob/cfa298060090e8d6c78662c3e01479df6713e720/internal/pkg/queue/item.go#L76
https://pkg.go.dev/net/url#Parse
Trying to parse a hostname and path without a scheme is invalid but may not necessarily return an error, due to parsing ambiguities.
https://go.dev/play/p/gY82hrbRTh2
We probably need a better url parser.
https://github.com/internetarchive/Zeno/blob/cfa298060090e8d6c78662c3e01479df6713e720/internal/pkg/queue/item.go#L76
https://pkg.go.dev/net/url#Parse
Trying to parse a hostname and path without a scheme is invalid but may not necessarily return an error, due to parsing ambiguities.
https://go.dev/play/p/gY82hrbRTh2
We probably need a better url parser.
Agreed. On the same topic.. I found Zeno to be absolutely unusable (I guess it's the queue) when queuing a 8.5M long seeds list. You might want to try it out if you're interested..
Hey @CorentinB, is this issue still open?
I'm exploring the codebase and looking for issues to contribute to the repo and getting familiar with the code. Could you suggest something unassigned and open that would be a good starting point for me?
Thanks!
Hi, current development is on dev/v2, feel free to check if the issue still happen with it :)
Hi @CorentinB, I’m exploring this issue to contribute. I noticed there’s no public dev/v2 branch in the repo — is that branch private, or has development already been merged back into main? Just want to make sure I’m testing/fixing the issue on the correct branch.