libkiwix icon indicating copy to clipboard operation
libkiwix copied to clipboard

/catalog doesn't work without access to ZIM files

Open rgaudin opened this issue 3 years ago • 10 comments

My understanding was that the catalog part of the server (ie. the OPDS engine) would only manipulate catalog-data and thus not require ZIM access. Its is not the case

wget download.kiwix.org/library/library_zim.xml
kiwix-serve --library --daemon -p 9999 ./library_zim.xml
curl localhost:9999/catalog/root.xml

Kiwix serve starts and loads the library properly (The library was successfully loaded.) but the OPDS requests all comes back empty

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:opds="http://opds-spec.org/2010/catalog">
  <id>1ae99b6e-a67b-db46-157a-fcc82a42d3a8</id>
  <title>All zims</title>
  <updated>2022-04-22T12:22:02Z</updated>

  <link rel="self" href="" type="application/atom+xml" />
  <link rel="search" type="application/opensearchdescription+xml" href="/catalog/searchdescription.xml" />
</feed>

curl localhost:9999/catalog/search?lang=fra

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:opds="http://opds-spec.org/2010/catalog">
  <id>d097adb8-1df3-d4f1-b77d-33f90d6b7793</id>
  <title>Filtered zims (lang=fra)</title>
  <updated>2022-04-22T12:23:04Z</updated>
  <totalResults>0</totalResults>
  <startIndex>0</startIndex>
  <itemsPerPage>0</itemsPerPage>
  <link rel="self" href="" type="application/atom+xml" />
  <link rel="search" type="application/opensearchdescription+xml" href="/catalog/searchdescription.xml" />
</feed>

rgaudin avatar Apr 22 '22 12:04 rgaudin

@rgaudin How looks the Library XML? Do you have urls (to download the ZIM) in it?

kelson42 avatar Apr 22 '22 13:04 kelson42

The library XML is the production one ; didn't you see the wget call above? So yes, there's an url attribute on each book. Should that matter?

rgaudin avatar Apr 22 '22 13:04 rgaudin

@mgautierfr @veloman-yunkan Definitly a blocker to https://github.com/kiwix/container-images/issues/147

kelson42 avatar Apr 22 '22 13:04 kelson42

Following @kelson42 suggestion, I removed the path attributes from all the books in the ZIM and I have a different startup output:

Loading the library from the following files:
	/library_zim.xml
The library was successfully loaded.
The XML library file '/library_zim.xml' is empty (or has only remote books).
The Kiwix server is running and can be accessed in the local network at: xxx

Though the result it still the same on OPDS endpoints

rgaudin avatar Apr 22 '22 13:04 rgaudin

I also believe that if a path is given and the ZIM file can not be loaded, the current strategy is to ignore and continue. This looks right, but an ERROR/WARNING message should better be printed.

kelson42 avatar Apr 22 '22 13:04 kelson42

This is something I've realize recently and comment in https://github.com/kiwix/libkiwix/issues/708#issuecomment-1095009085

Copying the important part:

[...] the catalog (root.xml, search, v2, ...) always returns books with local and valid zim files. And there is no way for now to have the list of remote books (ones with download link/url) whatever if they are local or not. It would be pretty easy to change (technically) but it add some functional complexity (API to define, kiwix-serve frontend assuming catalog returns books readable by kiwix-serve, ....)

mgautierfr avatar Apr 22 '22 14:04 mgautierfr

From discussion with @mgautierfr and @kelson42:

The issue with implementing this is that kiwix-serve currently serves two purposes:

  • an OPDS catalog on /catalog[/v2].
  • a ZIM reader that uses the catalog served on the same URL at /catalog.

The ZIM browser on / is just an HTML shell with a JS app that queries the catalog on /catalog.

ZIM browsing could work with a zim-less catalog but should it ? If so, it could not offer links to the demo content as it currently does as it would not be able to serve it. Or in case of a mixed catalog with ZIM-backed and ZIM-less Books, it would not know which are available.

Solving this would mean updating the OPDS response to conditionally include a link to HTML content.

Another issue is that, because it is available in Kiwix serve we host those two services to the public at https://library.kiwix.org:

  • the main, public OPDS catalog that all the ZIM readers uses. It's a SPOF and a critical part of our infrastruture.
  • a demo of all ZIM content offered as a convenience but that it not critical.

As the objective suggests, we should separate both services to have a dedicated ZIM-less OPDS catalog for ZIM readers and a dedicated ZIM-backed demo.

Keeping current URLs for both is not possible. Depending on how one understands “library” we could either:

  • keep current OPDS URL library.kiwix.org/catalog and serve the demo on a different domain (browse.library.kiwix.org?). Redirecting non-^/catalog prefixed requests to the other domain in the reverse-proxy preserves kiwix-serve yet allows previous link to continue to work (for some time?)
  • keep demo URL library.kiwix.org and serve the OPDS on a different domain (opds.library.kiwix.org?). This would require changing the OPDS URL in all the readers and maintain high availability of the demo for as long as the previous readers versions are being used as those version would use the demo catalog and not the OPDS-only one.

I am in favor of the first one.

rgaudin avatar Apr 28 '22 10:04 rgaudin

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Jul 10 '22 23:07 stale[bot]

@mgautierfr @veloman-yunkan Do we have here anything still to discuss before implementation?

kelson42 avatar Nov 26 '22 08:11 kelson42

@mgautierfr @veloman-yunkan Do we have here anything still to discuss before implementation?

Link to the browse-able content is not sorted (<link type="text/html" href="/content/lilote_fr_test_2023-01" />)

Would also be good to sort-out how we'll want to handle multiple illustrations to know whether this will problematic or not once we get there.

rgaudin avatar Jan 11 '23 16:01 rgaudin