safaribooks Wrong order of chapters

The order of chapters is messed up.

About The Author
Chapter 5
“About the Cover Illustration”
Chapter 1, 2, 3, 4
Chapter 6 <..>

Is it an issue with this specific title "Marko Lukša. “Kubernetes in Action.” only? I tried to run the following command twice with the same outcome both times: python3 safaribooks.py --cred "xxx:yyy" --kindle 9781617293726

Aug 10 '21 13:08 vic-by

I can confirm that it's not only this book, it happens to other books I've tested. Indeed the chapters are not in chronological order. There is a problem with the creation of the content.opf file. Below shows the contents of content.opf and Chapter 05 (right after <itemref idref="Author"/>) doesn't belong in that particular line as well as <itemref idref="resources"/> between Chapter 16 and 17 is in the wrong place.

<spine toc="ncx">
    <itemref idref="titlepage"/>
    <itemref idref="titl"/>
    <itemref idref="Copyright"/>
    <itemref idref="Dedication"/>
    <itemref idref="btoc"/>
    <itemref idref="toc"/>
    <itemref idref="Preface"/>
    <itemref idref="Acknowledgments"/>
    <itemref idref="Book"/>
    <itemref idref="Author"/>
    <itemref idref="05"/>  <!-- Chapter 5 is placed in the wrong order -->
    <itemref idref="Cover"/>
    <itemref idref="p1"/>
    <itemref idref="01"/>
    <itemref idref="02"/>
    <itemref idref="p2"/>
    <itemref idref="03"/>
    <itemref idref="04"/>
    <itemref idref="06"/>
    <itemref idref="07"/>
    <itemref idref="08"/>
    <itemref idref="09"/>
    <itemref idref="10"/>
    <itemref idref="p3"/>
    <itemref idref="11"/>
    <itemref idref="12"/>
    <itemref idref="13"/>
    <itemref idref="14"/>
    <itemref idref="15"/>
    <itemref idref="16"/>
    <itemref idref="resources"/> <!-- Resources is placed in the wrong order -->
    <itemref idref="17"/>
    <itemref idref="18"/>
    <itemref idref="A"/>
    <itemref idref="B"/>
    <itemref idref="C"/>
    <itemref idref="D"/>
    <itemref idref="Index"/>
    <itemref idref="Figures"/>
    <itemref idref="Tables"/>
    <itemref idref="Listings"/>
  </spine>

It's either the python script parsing the wrong order and appending or there is some sort of re-arrangement causing the issue.

My Environment:

I'm running the latest commit (e016ad3) as of this post.

OS: Ubuntu 21.10 x86_64
Kernel: 5.13.0-20-generic
Shell: bash 5.1.8
Node: v12.22.5
npm: v8.1.1
Python3: v3.9.7

Oct 26 '21 18:10 0Ky

I can confirm that this happens quite a lot for me too, in fact basically every book i download. One book that you can test is Strategic Monoliths and Microservices: Driving Innovation Using Purposeful Architecture.

The generated content.obf for me is the following. You can notice for example chapter three coming after preface

<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
<item id="cover" href="cover.xhtml" media-type="application/xhtml+xml" />
<item id="pref00" href="pref00.xhtml" media-type="application/xhtml+xml" />
<item id="praise" href="praise.xhtml" media-type="application/xhtml+xml" />
<item id="halftitle" href="halftitle.xhtml" media-type="application/xhtml+xml" />
<item id="fm01" href="fm01.xhtml" media-type="application/xhtml+xml" />
<item id="title" href="title.xhtml" media-type="application/xhtml+xml" />
<item id="copyright" href="copyright.xhtml" media-type="application/xhtml+xml" />
<item id="contents" href="contents.xhtml" media-type="application/xhtml+xml" />
<item id="foreword" href="foreword.xhtml" media-type="application/xhtml+xml" />
<item id="preface" href="preface.xhtml" media-type="application/xhtml+xml" />
<item id="ch03" href="ch03.xhtml" media-type="application/xhtml+xml" />
<item id="acknowledgments" href="acknowledgments.xhtml" media-type="application/xhtml+xml" />
<item id="authors" href="authors.xhtml" media-type="application/xhtml+xml" />
<item id="part01" href="part01.xhtml" media-type="application/xhtml+xml" />
<item id="ch01" href="ch01.xhtml" media-type="application/xhtml+xml" />
<item id="ch02" href="ch02.xhtml" media-type="application/xhtml+xml" />
<item id="part02" href="part02.xhtml" media-type="application/xhtml+xml" />
<item id="ch04" href="ch04.xhtml" media-type="application/xhtml+xml" />
<item id="ch05" href="ch05.xhtml" media-type="application/xhtml+xml" />
<item id="ch06" href="ch06.xhtml" media-type="application/xhtml+xml" />
<item id="ch07" href="ch07.xhtml" media-type="application/xhtml+xml" />
<item id="part03" href="part03.xhtml" media-type="application/xhtml+xml" />
<item id="ch08" href="ch08.xhtml" media-type="application/xhtml+xml" />
<item id="ch09" href="ch09.xhtml" media-type="application/xhtml+xml" />
<item id="part04" href="part04.xhtml" media-type="application/xhtml+xml" />
<item id="ch10" href="ch10.xhtml" media-type="application/xhtml+xml" />
<item id="ch11" href="ch11.xhtml" media-type="application/xhtml+xml" />
<item id="ch12" href="ch12.xhtml" media-type="application/xhtml+xml" />
<item id="index" href="index.xhtml" media-type="application/xhtml+xml" />

Oct 29 '21 12:10 johnnywiller

I'm sure someone with better understanding of python and the code base can explain this better than I can, but the problem looks like it's related to line #567 in safaribooks.py.

Problem

When taking a closer look at the original code causing the problem:

result.extend([c for c in response["results"] if "cover" in c["filename"] or "cover" in c["title"]])

The code above will append an item to the "result" list if the dictionary variable contains the word "cover" inside of "filename" or "title" key's value. I'm certain if you look at the chapters or sections that are incorrectly ordered, it will contain the word "cover" in the title or filename, so the problem here is that we can't use in operator because cover can be seen in words like dis𝘤𝘰𝘷𝘦𝘳y.

Workaround?

result.extend([c for c in response["results"] if "cover" == c["title"].lower() or "cover.xhtml" == c["filename"].lower() or "titlepage.xhtml" == c["filename"].lower()])

I've change the in operator to == which should do an exact match of the string and I've added titlepage.xhtml because there are books like API Security in Action (9781617296024) & The Art of Network Penetration Testing (9781617296826) that doesn't contain a cover.xhtml file, instead it's called titlepage.xhtml.

That sorted out the wrong order of chapters, but now there's an issue with the books mentioned above where the cover image from titlepage.xhtml isn't downloaded when using the workaround code above.

Oct 30 '21 17:10 0Ky

safaribooks safaribooks copied to clipboard

Wrong order of chapters

My Environment:

Problem

Workaround?

safaribooks
safaribooks copied to clipboard