courlan icon indicating copy to clipboard operation
courlan copied to clipboard

Courlan does not load `/page/` links

Open sbusso opened this issue 2 years ago • 3 comments

In reference to the nav filter, courlan will not extract links containing /page/ path. Also, I think page and tag|category should be handled separately. I do need to get all blog posts on my website, which are paginated but I don't want to get tags and categories.

sbusso avatar Mar 06 '23 03:03 sbusso

Hi @sbusso, I'm not sure what you mean regarding the /page/ pattern, maybe it's a documentation issue. I added tests, could you please look at the commit above and see if you can make the code work in your case or provide more details?

The separation of pagination from the rest makes sense, I'll think about how if could be implemented.

adbar avatar Mar 06 '23 18:03 adbar

URLs containing /page/1, /page/2 won't be extracted with extract_links without making with_nav=True, also this option will also include other index pages like tags and categories. I'd think page and maybe archives could be separated or extended options.

https://github.com/adbar/courlan/blob/02e1afe24b19b42a0ac481ddac60832a3399fa8c/courlan/filters.py#L49-L52

sbusso avatar Mar 06 '23 19:03 sbusso

I see, let's keep an eye on that.

adbar avatar Mar 07 '23 11:03 adbar