kiwix-serve crashes on mount/remount
Ways to reproduce:
Point kiwix-serve to an external device. Unplug and plug back in. kiwix-serve is now dead.
Expected / better behavior:
kiwix-serve tells the end-user (in the browser) that it's unavailable until the library location is available again and then resumes.
Cases for this:
- Someone unplugs a cable by mistake
- Power outage leaves a battery driven device (laptop) on but external USB storage off
- System starts kiwix-serve on boot before the served location is ready.
Here is the crash dump I had.
Sep 26 11:04:16 fair-server systemd[1]: kiwix.service: Control process exited, code=exited status=1
Sep 26 11:04:16 fair-server systemd[1]: kiwix.service: Failed with result 'exit-code'.
Sep 26 11:15:37 fair-server kiwix[12453]: terminate called after throwing an instance of 'std::ios_base::failure'
Sep 26 11:15:37 fair-server kiwix[12453]: what(): Cannot read char.
@benjaoming I guess you understand what is going on behind the scene. I'm not sure handling this use case properly is so trivial.
@kelson42 I'm sure it's not trivial, nothing like that implied.
Suggesting to try handling appropriate I/O errors (as much as possible, maybe just start with the most common errors) and displaying a "library not available" error message in kiwix-serve.
@benjaoming I tried with latest 3.2.0-1 and GNU/Linux and actually for me, now, it does not crash (anymore?). Somehow the kiwix-serve is stuck and load without any end.
@mgautierfr @veloman-yunkan To me, this is still not the proper behaviour. Proper behaviour would be to return 404 and remove book from internal library.
I don't think there's valid answer for this but we have to make a choice. Here's my opinion:
- I/O errors can be temporary and/or can affect only part of a ZIM file. Therefor, removing a book from the library on an I/O error seems too strict. It doesn't give any chance at coming back.
- An expected book that fails to read should raise a
500error, not a404. If the book gets removed from the library, subsequent responses will obviously be404. - I wonder how
--monitorLibrarybehaves in this case (assuming the library file itself is on the disappearing fs) but it kiwix-serve it might be interesting for weak mount points:- library disapears, kiwix-serve is notified and empties library (might be interesting to have the library datetime in the home UI btw)
- library file comes back, kiwix-serve is notified and re-reads the library
Recovering from such a system error is pretty complex. All fd are by definition invalidated and return io error. Plugin back the usb drive will not revalidate the fd magically, kiwix-serve (and any other application) will have to close the file and reopen it.
- The first step would be to correctly detect the io error
- In case of io error, return a 500 and remove the book from the internal library cache (as if the book was never opened)
- On request, if the file is not in the cache, we try to open it anyway, so if the usb drive is plugged back we should correctly open the file.
I don't think we should remove the book from the library at all. It is not to kiwix-serve to modify the input libary. And we already have a filtering at kiwix-serve start up to filter book with invalid path.