Categoriy HTML with sub categories and page members.
Category pages are rendered with sub categories and page members. (See https://github.com/openzim/mwoffliner/issues/2245)
This change adds a new command line parameter getCategoryPages. When this is enabled the category pages get an HTML part with all the pages that are member of the category.
There are a few things that could be improved:
- Titles for "Category", "Sub-Categories" and "Pages" could be localized.
- I have only tested this with a small wiki. There could be problems when categories have a lot of pages or sub categories. Is there a way to paginate this?
I just tested this. Here is what I found out:
-
it did not generated category pages, all links are broken The reason it did not generate the category pages is because of the articleList filter. I've build this into the regular download of pages, so it only downloads what's in the filter. If you set the filter to Panini_Comics,Categoria:Panini_Comics it should also download the category. But the page links were still broken on your example. As far as I've seen the link itself should be correct, but the browser adds /C/ in front of the page link and this causes the page link to fail. Do you have an idea how to fix this? You can look at the abstract.renderer.ts. I've just used the function
encodeArticleIdForZimHtmlUrl(page.title)to generate the page links. Is there maybe a better function to generate the link? -
the collapse of category div on
Panini_Comicspage is really weird I found out that this is only happening when the argument --forceRender ActionParse is set. This seems to change something in the css classes so that thesection-headingcss class is not working anymore. I'm not familar with this different renderer. Do you have an idea how this could be fixed? -
do you have an example where the API is returning both categories and subcategories properties? To get subcategories of an category you need to use the list=categorymembers api call. Here is an example that loads all categories, subcategories and pages: https://it.wikipedia.org/w/api.php?action=query&list=categorymembers&format=json&cmlimit=max&cmtitle=Categoria:Panini_Comics&cmprop=title|sortkeyprefix|type&prop=redirects|revisions|pageimages|coordinates|categories&rdlimit=max&rdnamespace=0|14&redirects=true&formatversion=2&titles=Panini_Comics&colimit=max&cllimit=max&clshow=!hidden
-
why do you need two parameters
getCategoriesandgetCategoryPages? Why would someone want to retrieve categories but not category pages? I've implemented the new parameter because there are a lot more pages assigned to categories than there are categories assigned to other categories. This way the pages could be excluded from really big wikis. The biggest category that I found on Wikipedia has 2,325,761 pages assigned to it. Here is the link: https://en.wikipedia.org/w/index.php?title=Category:All_stub_articles
@TheNetStriker I'm sorry but I miss sufficient bandwidth this week to analyze this PR. Will do me best next week.
I'm sorry, but I still don't get what you are trying to achieve with getCategoryPages setting. From what I read, this parameter is used to call setArticlePageMembers, which seems to be used to retrieve members of every categories. But this information is already retrieved without this parameters (the pages attribute in QueryRet from my understanding). Can you help me understand the difference?
Command I used:
docker run --rm --name mwoffliner_test -v $PWD/output:/output local-mwoffliner mwoffliner [email protected] --customZimDescription="Test" --customZimTitle=Test --filenamePrefix=tests_en_mwoffliner --format=nopic --mwUrl=https://bm.wikipedia.org --outputDirectory=/output --publisher=openZIM --verbose=log --webp --forceRender ActionParse --getCategories --getCategoryPages
I also don't get your point regarding pagination. To me the calls you've added are already paginated.
The UI issues you are seeing is linked to the fact that these categories must be adapted to the skin used, since we retrieve CSS linked to this skin.
The reason it did not generate the category pages is because of the articleList filter.
OK, we can maybe live with this limitation for now. I don't know. But this must be documented somewhere. And this is clearly not really convenient since it basically means than we cannot use --getCategories with --articleList since it is probably too complex for most users to find all categories to add to the articleList. Plus it is a bit deceptive to have a parameter named getCategories which don't get categories unless you've listed them in articleList...
Or maybe this is just an indication that current approach is wrong: in a naive approach, I would not have done categories fetching like we do articles, but rather walk the tree up from categories associated with one articles we have explored up to the top. But this is maybe naive / not working.
I guess, we cqn not move forward this PR, we should close it?
Let's close it (for now at least), it is not possible to merge, probably need significant redesign and we miss feedback from contributor.