courlan
courlan copied to clipboard
Courlan does not load `/page/` links
In reference to the nav filter, courlan will not extract links containing /page/ path. Also, I think page and tag|category should be handled separately. I do need to get all blog posts on my website, which are paginated but I don't want to get tags and categories.
Hi @sbusso, I'm not sure what you mean regarding the /page/ pattern, maybe it's a documentation issue. I added tests, could you please look at the commit above and see if you can make the code work in your case or provide more details?
The separation of pagination from the rest makes sense, I'll think about how if could be implemented.
URLs containing /page/1, /page/2 won't be extracted with extract_links without making with_nav=True, also this option will also include other index pages like tags and categories. I'd think page and maybe archives could be separated or extended options.
https://github.com/adbar/courlan/blob/02e1afe24b19b42a0ac481ddac60832a3399fa8c/courlan/filters.py#L49-L52
I see, let's keep an eye on that.