export-kobo
export-kobo copied to clipboard
Getting the annotation ordered by page ?
Hello, Thanks for this usefull tool :) I'm having a small problem with the annotations : Most of the time, I read a book two or three times, and I annotate something every time I read it. The problem is that when I export the Annotations, they are ordered by date and not by Page number or Line number, so when I then read the Annotations, the order is a bit random and it makes no real sense according to the book order/chapters. Is it possible to add a functionality that will order the Annotations in order of appearance in a book, instead of ordering them by the date ? Thanks :)
Hi,
thank you for your interest in this tiny script, and you are welcome.
As I briefly mentioned here:
https://github.com/pettarin/export-kobo/issues/1#issuecomment-273331942
with regard to another issue, it does not seem to me that the KoboReader.sqlite file contains the location information in a sane way.
In particular, the location information seems to rely on the Kobo-fied version of the EPUB, which is similar to the EPUB Canonical Fragment Identifier (EPUB CFI) format. For example, in table "Bookmarks" of the SQLite file, for each annotation/bookmark/highlighting, the location is expressed by:
ContentID: 136dc3fc-b6c2-4006-bb65-390f5e26e0df!OEBPS!ch01.html StartContainerPath: span#kobo.3.1 StartOffset: 18 EndContainerPath: span#kobo.3.1 EndOffset: 213
While the ContainerPath/Offset seems amenable to be ordered lexicographically without any knowledge about their semantics (again, I speculate they are the "Kobo equivalent" to EPUB CFI), the ContentID depends on the structure of the EPUB, and you cannot just order it lexicographically, because e.g. "acknowledgments.html" might be after "ch01.html" in the reading order of the EPUB.
The table "content" contains the values for ContentID (so I guess there is a foreign key relationship between tables "content" and "Bookmarks"), and there is a VolumeIndex integer field that seems to suggest some ordering of the ContentID values. However, in the KoboReader.sqlite I have, for some EPUB books there are gaps in the values, and for some other EPUB books there is no VolumeIndex value at all.
Point #3 of https://github.com/pettarin/export-kobo#notes says: "Bear in mind that no official specifications are published by Kobo, hence the script works as far as my understanding of the database structure of KoboReader.sqlite is correct, and its schema remains the same."
On 01/27/2017 03:30 PM, sappounet wrote:
Hello, Thanks for this usefull tool :) I'm having a small problem with the annotations : Most of the time, I read a book two or three times, and I annotate something every time I read it. The problem is that when I export the Annotations, they are ordered by date and not by Page number or Line number, so when I then read the Annotations, the order is a bit random and it makes no real sense according to the book order/chapters. Is it possible to add a functionality that will order the Annotations in order of appearance in a book, instead of ordering them by the date ? Thanks :)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pettarin/export-kobo/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AD5Zk1JWk8VpZyoD2VrW4KVJDal_zpuxks5rWf-ZgaJpZM4Lv0rp.
Hi :)
I kind of forgot about exporting Notes from my Kobo reader, but I decided to give it one more try today.
I've been browsing the content of the file KoboReader.sqlite, and it seems that indeed, the column "Bookmark.StartContainerPath" is kind of containing a value that is precise enough to order the result in an order that match the appearance of the highlights in the book.
I tried with couple of books, and this query returns ALMOST the right order :
SELECT Bookmark.Text FROM Bookmark WHERE Bookmark.VolumeID="<Insert_Book_VolumeID_Here>" ORDER BY Bookmark.StartContainerPath
I wrote "ALMOST", because the "ORDER BY" cause is a bit stupid and put the following row index_split_018.xhtml#point(/1/4/115:1) before index_split_018.xhtml#point(/1/4/81:1) it is because the 1 of "115" comes before "8" of "81" in the alphabetical order.
But i'm pretty sure it's quite easy to fix that with python (but my python skills are bad, so I'm going to ask a coworker tomorrow :) )
So yeah, maybe for some books it will not work, but it seems that so far it's enough to get the Notes in the right order for pretty much all Ebooks that I tried :)
I'll keep you posted :)
If Bookmark.StartContainerPath is the right order, we can just add a numeric sort in the read_items
function.
I'm happy to contribute a pull request on this; however I'm not totally sure about this. Need the author's confirmation.
Well, again, nobody except Kobo knows for sure, we can only observe the values their code puts in the SQLite file. If you want to provide a PR with the functionality, I will merge. But I am afraid that there is no simple way to correctly sort Bookmark.StartContainerPath (even parsing it to take into account numeric vs. lexicographic values) by just looking at its values, since an EPUB might contain a file "page2.xhtml" appearing before "page1.xhtml" in the TOC order. Or, if you fancy another example, if you just sort Bookmark.StartContainerPath, you get "acknowledgments.xhtml" before "title.xhtml".
Unfortunately I have no time (and no longer a Kobo!) to investigate the issuer further.