Robin Wood

Results 147 comments of Robin Wood

It was only checking the page name, not the parameters, I've just pushed an update, see if that fixes it. https://github.com/digininja/CeWL/commit/ff7e4854da355f116448219ad8d0e9fe68592caa

It only checks the path and not the domain looking at that line of code. Are you expecting it to check the domain as well? On Wed, 20 Apr 2022,...

Not currently possible. You could easily tweak that line to check the domain instead. I don't know the property off hand, but try domain instead of path. On Wed, 20...

Glad you like it. If you get stuck, let me know, and I'll have a look for the right property in the morning. On Wed, 20 Apr 2022, 22:40 03k64serenity,...

I'll have a look as soon as I get chance. On Thu, 28 Apr 2022, 21:09 spencer-dollahite, ***@***.***> wrote: > https://github.com/spencer-dollahite/CeWL/blob/master/cewl.rb > > This is the sort of approach/feature I'd...

There are two problems, the app can't know the size of the site before it starts, and so can't do any type of progress bar, and the app is single...

I could include something that said how much has been done so far, but there is no way to guess how much there is left to do and as the...

That is deliberate. CeWL sticks to the domain it has been asked to spider unless you set the flag to let it go off site. This is to stop it...

On some sites they are like subdirectories but on others they are completely different sites. I'll have a think about it, part of it depends on how easy the spider...

I've been thinking about this and trying to work out parentage is probably going to be too hard. Trying to work out where the domain ends and the TLD starts...