crawly icon indicating copy to clipboard operation
crawly copied to clipboard

robots.txt matching is pretty buggy

Open serpent213 opened this issue 1 year ago • 9 comments

@oltarasenko I just noticed you are maintaining your own fork of Gollum. I found its matching algorithm having some severe bugs and also feels hard to reason about.

Just wanted to draw your attention to two pull requests, which introduce additional tests.

Might rewrite the matcher, but first things first.

serpent213 avatar Apr 03 '23 11:04 serpent213

@serpent213 yeah, I forked the Gollum a few years ago due to the same reason. I had the plan to fix the parsing there but did not have enough time to do that. Unfortunately :(.

oltarasenko avatar Apr 04 '23 06:04 oltarasenko

Hi,

I would like to pick this task and start improving the parser if that is okay. @serpent213 could you resubmit your pull request to the forked repo from oltarasenko? (https://github.com/elixir-crawly/gollum)

After it has been merged we can start to work on improving the parser in that fork since it seems like the original project is abandoned by now.

fstp avatar Aug 07 '23 17:08 fstp

@fstp Please let me know if I can help you in any way.

oltarasenko avatar Aug 07 '23 17:08 oltarasenko

@fstp See https://github.com/elixir-crawly/gollum/pull/1.

serpent213 avatar Aug 07 '23 18:08 serpent213

Thank you for the quick response! :) @oltarasenko if you want you could merge the pull request from @serpent213 and then I can create my pull request based on that. Or is it better that I push directly to the pull request?

fstp avatar Aug 07 '23 19:08 fstp

One of the commits introduces a failing test, IIRC. The Google-port has a lot of stuff tagged as skip to pass.

serpent213 avatar Aug 07 '23 21:08 serpent213

Ah okay I see, can I clone your pull request somehow and push the code directly there? Would be nice to run test driven development against the failing testcases

fstp avatar Aug 07 '23 22:08 fstp

Maybe you can start your own branch and cherry-pick the commits?

Edit: Added you to my fork, @fstp, not sure that is of any help. Note that my fork is based on https://github.com/ravern/gollum, but it's rebased to https://github.com/elixir-crawly/gollum/.

serpent213 avatar Aug 07 '23 22:08 serpent213

Thanks! I will try to work directly on your fork, seems most convenient since we can just merge it all together then :)

fstp avatar Aug 08 '23 08:08 fstp

We have updated the Gollum a bit. Probably bugs related to it's implementation can go to it's repo.

oltarasenko avatar Apr 09 '24 10:04 oltarasenko