Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

Zeno doesn't start with get list and an empty line in list

Open CorentinB opened this issue 1 year ago • 4 comments

If you use get list with a seeds list that contain an empty line, Zeno won't start crawling.

CorentinB avatar Sep 09 '24 12:09 CorentinB

https://github.com/internetarchive/Zeno/blob/cfa298060090e8d6c78662c3e01479df6713e720/internal/pkg/queue/item.go#L76

https://pkg.go.dev/net/url#Parse

Trying to parse a hostname and path without a scheme is invalid but may not necessarily return an error, due to parsing ambiguities.

https://go.dev/play/p/gY82hrbRTh2


We probably need a better url parser.

yzqzss avatar Sep 13 '24 16:09 yzqzss

https://github.com/internetarchive/Zeno/blob/cfa298060090e8d6c78662c3e01479df6713e720/internal/pkg/queue/item.go#L76

https://pkg.go.dev/net/url#Parse

Trying to parse a hostname and path without a scheme is invalid but may not necessarily return an error, due to parsing ambiguities.

https://go.dev/play/p/gY82hrbRTh2

We probably need a better url parser.

Agreed. On the same topic.. I found Zeno to be absolutely unusable (I guess it's the queue) when queuing a 8.5M long seeds list. You might want to try it out if you're interested..

CorentinB avatar Sep 13 '24 16:09 CorentinB

Hey @CorentinB, is this issue still open?

I'm exploring the codebase and looking for issues to contribute to the repo and getting familiar with the code. Could you suggest something unassigned and open that would be a good starting point for me?

Thanks!

yash-raj10 avatar Mar 20 '25 06:03 yash-raj10

Hi, current development is on dev/v2, feel free to check if the issue still happen with it :)

CorentinB avatar Mar 20 '25 06:03 CorentinB

Hi @CorentinB, I’m exploring this issue to contribute. I noticed there’s no public dev/v2 branch in the repo — is that branch private, or has development already been merged back into main? Just want to make sure I’m testing/fixing the issue on the correct branch.

ParthAggarwal16 avatar Dec 06 '25 06:12 ParthAggarwal16