/catalog doesn't work without access to ZIM files
My understanding was that the catalog part of the server (ie. the OPDS engine) would only manipulate catalog-data and thus not require ZIM access. Its is not the case
wget download.kiwix.org/library/library_zim.xml
kiwix-serve --library --daemon -p 9999 ./library_zim.xml
curl localhost:9999/catalog/root.xml
Kiwix serve starts and loads the library properly (The library was successfully loaded.) but the OPDS requests all comes back empty
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:opds="http://opds-spec.org/2010/catalog">
<id>1ae99b6e-a67b-db46-157a-fcc82a42d3a8</id>
<title>All zims</title>
<updated>2022-04-22T12:22:02Z</updated>
<link rel="self" href="" type="application/atom+xml" />
<link rel="search" type="application/opensearchdescription+xml" href="/catalog/searchdescription.xml" />
</feed>
curl localhost:9999/catalog/search?lang=fra
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:opds="http://opds-spec.org/2010/catalog">
<id>d097adb8-1df3-d4f1-b77d-33f90d6b7793</id>
<title>Filtered zims (lang=fra)</title>
<updated>2022-04-22T12:23:04Z</updated>
<totalResults>0</totalResults>
<startIndex>0</startIndex>
<itemsPerPage>0</itemsPerPage>
<link rel="self" href="" type="application/atom+xml" />
<link rel="search" type="application/opensearchdescription+xml" href="/catalog/searchdescription.xml" />
</feed>
@rgaudin How looks the Library XML? Do you have urls (to download the ZIM) in it?
The library XML is the production one ; didn't you see the wget call above? So yes, there's an url attribute on each book. Should that matter?
@mgautierfr @veloman-yunkan Definitly a blocker to https://github.com/kiwix/container-images/issues/147
Following @kelson42 suggestion, I removed the path attributes from all the books in the ZIM and I have a different startup output:
Loading the library from the following files:
/library_zim.xml
The library was successfully loaded.
The XML library file '/library_zim.xml' is empty (or has only remote books).
The Kiwix server is running and can be accessed in the local network at: xxx
Though the result it still the same on OPDS endpoints
I also believe that if a path is given and the ZIM file can not be loaded, the current strategy is to ignore and continue. This looks right, but an ERROR/WARNING message should better be printed.
This is something I've realize recently and comment in https://github.com/kiwix/libkiwix/issues/708#issuecomment-1095009085
Copying the important part:
[...] the catalog (root.xml, search, v2, ...) always returns books with local and valid zim files. And there is no way for now to have the list of remote books (ones with download link/url) whatever if they are local or not. It would be pretty easy to change (technically) but it add some functional complexity (API to define, kiwix-serve frontend assuming catalog returns books readable by kiwix-serve, ....)
From discussion with @mgautierfr and @kelson42:
The issue with implementing this is that kiwix-serve currently serves two purposes:
- an OPDS catalog on
/catalog[/v2]. - a ZIM reader that uses the catalog served on the same URL at
/catalog.
The ZIM browser on / is just an HTML shell with a JS app that queries the catalog on /catalog.
ZIM browsing could work with a zim-less catalog but should it ? If so, it could not offer links to the demo content as it currently does as it would not be able to serve it. Or in case of a mixed catalog with ZIM-backed and ZIM-less Books, it would not know which are available.
Solving this would mean updating the OPDS response to conditionally include a link to HTML content.
Another issue is that, because it is available in Kiwix serve we host those two services to the public at https://library.kiwix.org:
- the main, public OPDS catalog that all the ZIM readers uses. It's a SPOF and a critical part of our infrastruture.
- a demo of all ZIM content offered as a convenience but that it not critical.
As the objective suggests, we should separate both services to have a dedicated ZIM-less OPDS catalog for ZIM readers and a dedicated ZIM-backed demo.
Keeping current URLs for both is not possible. Depending on how one understands “library” we could either:
- keep current OPDS URL
library.kiwix.org/catalogand serve the demo on a different domain (browse.library.kiwix.org?). Redirecting non-^/catalogprefixed requests to the other domain in the reverse-proxy preserves kiwix-serve yet allows previous link to continue to work (for some time?) - keep demo URL
library.kiwix.organd serve the OPDS on a different domain (opds.library.kiwix.org?). This would require changing the OPDS URL in all the readers and maintain high availability of the demo for as long as the previous readers versions are being used as those version would use the demo catalog and not the OPDS-only one.
I am in favor of the first one.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
@mgautierfr @veloman-yunkan Do we have here anything still to discuss before implementation?
@mgautierfr @veloman-yunkan Do we have here anything still to discuss before implementation?
Link to the browse-able content is not sorted (<link type="text/html" href="/content/lilote_fr_test_2023-01" />)
Would also be good to sort-out how we'll want to handle multiple illustrations to know whether this will problematic or not once we get there.