python-sitemap icon indicating copy to clipboard operation
python-sitemap copied to clipboard

Limit search to path instead of domain?

Open 1kastner opened this issue 7 years ago • 5 comments

Could it be possible to restrict the search to a certain path? A bad example would be to restrict a search to http://google.com/maps/ and ignore results which are in other "subdirectories" of http://google.com/. Using "domain" for this purpose does not work.

1kastner avatar Nov 22 '18 11:11 1kastner

Hi,

Sorry for the delay. You can do it via

--exclude "maps/"

But it has to be exhaustive.

You wan't something generic for all subfolders?

c4software avatar Dec 11 '18 11:12 c4software

Well, actually it is an include logic which is not yet implemented in https://github.com/c4software/python-sitemap/blob/master/main.py

1kastner avatar Dec 12 '18 09:12 1kastner

I agree that it would be cool to have an "include" function in the crawler. 1kastner, I think your phrase "A bad example" may have read the opposite way to crsoftware.

davidcx89 avatar Dec 17 '18 07:12 davidcx89

@davidcx89 yeap, sorry for bad phrasing, I maybe should have put more effort on describing the issue.

If I'll find the time there might be a pull request somewhen soon.

1kastner avatar Dec 17 '18 08:12 1kastner

Hi,

An include pattern is indeed a great idea. Something with reggex would be really great.

I will try to doing this quickly. Maybe this weekend.

c4software avatar Dec 20 '18 21:12 c4software