safaribooks icon indicating copy to clipboard operation
safaribooks copied to clipboard

Wrong order of chapters

Open vic-by opened this issue 3 years ago • 3 comments

The order of chapters is messed up.

  • About The Author
  • Chapter 5
  • “About the Cover Illustration”
  • Chapter 1, 2, 3, 4
  • Chapter 6 <..>

Is it an issue with this specific title "Marko Lukša. “Kubernetes in Action.” only? I tried to run the following command twice with the same outcome both times: python3 safaribooks.py --cred "xxx:yyy" --kindle 9781617293726

vic-by avatar Aug 10 '21 13:08 vic-by

I can confirm that it's not only this book, it happens to other books I've tested. Indeed the chapters are not in chronological order. There is a problem with the creation of the content.opf file. Below shows the contents of content.opf and Chapter 05 (right after <itemref idref="Author"/>) doesn't belong in that particular line as well as <itemref idref="resources"/> between Chapter 16 and 17 is in the wrong place.

<spine toc="ncx">
    <itemref idref="titlepage"/>
    <itemref idref="titl"/>
    <itemref idref="Copyright"/>
    <itemref idref="Dedication"/>
    <itemref idref="btoc"/>
    <itemref idref="toc"/>
    <itemref idref="Preface"/>
    <itemref idref="Acknowledgments"/>
    <itemref idref="Book"/>
    <itemref idref="Author"/>
    <itemref idref="05"/>  <!-- Chapter 5 is placed in the wrong order -->
    <itemref idref="Cover"/>
    <itemref idref="p1"/>
    <itemref idref="01"/>
    <itemref idref="02"/>
    <itemref idref="p2"/>
    <itemref idref="03"/>
    <itemref idref="04"/>
    <itemref idref="06"/>
    <itemref idref="07"/>
    <itemref idref="08"/>
    <itemref idref="09"/>
    <itemref idref="10"/>
    <itemref idref="p3"/>
    <itemref idref="11"/>
    <itemref idref="12"/>
    <itemref idref="13"/>
    <itemref idref="14"/>
    <itemref idref="15"/>
    <itemref idref="16"/>
    <itemref idref="resources"/> <!-- Resources is placed in the wrong order -->
    <itemref idref="17"/>
    <itemref idref="18"/>
    <itemref idref="A"/>
    <itemref idref="B"/>
    <itemref idref="C"/>
    <itemref idref="D"/>
    <itemref idref="Index"/>
    <itemref idref="Figures"/>
    <itemref idref="Tables"/>
    <itemref idref="Listings"/>
  </spine>

It's either the python script parsing the wrong order and appending or there is some sort of re-arrangement causing the issue.

My Environment:

I'm running the latest commit (e016ad3) as of this post.

  • OS: Ubuntu 21.10 x86_64
  • Kernel: 5.13.0-20-generic
  • Shell: bash 5.1.8
  • Node: v12.22.5
  • npm: v8.1.1
  • Python3: v3.9.7

0Ky avatar Oct 26 '21 18:10 0Ky

I can confirm that this happens quite a lot for me too, in fact basically every book i download. One book that you can test is Strategic Monoliths and Microservices: Driving Innovation Using Purposeful Architecture.

The generated content.obf for me is the following. You can notice for example chapter three coming after preface

<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
<item id="cover" href="cover.xhtml" media-type="application/xhtml+xml" />
<item id="pref00" href="pref00.xhtml" media-type="application/xhtml+xml" />
<item id="praise" href="praise.xhtml" media-type="application/xhtml+xml" />
<item id="halftitle" href="halftitle.xhtml" media-type="application/xhtml+xml" />
<item id="fm01" href="fm01.xhtml" media-type="application/xhtml+xml" />
<item id="title" href="title.xhtml" media-type="application/xhtml+xml" />
<item id="copyright" href="copyright.xhtml" media-type="application/xhtml+xml" />
<item id="contents" href="contents.xhtml" media-type="application/xhtml+xml" />
<item id="foreword" href="foreword.xhtml" media-type="application/xhtml+xml" />
<item id="preface" href="preface.xhtml" media-type="application/xhtml+xml" />
<item id="ch03" href="ch03.xhtml" media-type="application/xhtml+xml" />
<item id="acknowledgments" href="acknowledgments.xhtml" media-type="application/xhtml+xml" />
<item id="authors" href="authors.xhtml" media-type="application/xhtml+xml" />
<item id="part01" href="part01.xhtml" media-type="application/xhtml+xml" />
<item id="ch01" href="ch01.xhtml" media-type="application/xhtml+xml" />
<item id="ch02" href="ch02.xhtml" media-type="application/xhtml+xml" />
<item id="part02" href="part02.xhtml" media-type="application/xhtml+xml" />
<item id="ch04" href="ch04.xhtml" media-type="application/xhtml+xml" />
<item id="ch05" href="ch05.xhtml" media-type="application/xhtml+xml" />
<item id="ch06" href="ch06.xhtml" media-type="application/xhtml+xml" />
<item id="ch07" href="ch07.xhtml" media-type="application/xhtml+xml" />
<item id="part03" href="part03.xhtml" media-type="application/xhtml+xml" />
<item id="ch08" href="ch08.xhtml" media-type="application/xhtml+xml" />
<item id="ch09" href="ch09.xhtml" media-type="application/xhtml+xml" />
<item id="part04" href="part04.xhtml" media-type="application/xhtml+xml" />
<item id="ch10" href="ch10.xhtml" media-type="application/xhtml+xml" />
<item id="ch11" href="ch11.xhtml" media-type="application/xhtml+xml" />
<item id="ch12" href="ch12.xhtml" media-type="application/xhtml+xml" />
<item id="index" href="index.xhtml" media-type="application/xhtml+xml" />

johnnywiller avatar Oct 29 '21 12:10 johnnywiller

I'm sure someone with better understanding of python and the code base can explain this better than I can, but the problem looks like it's related to line #567 in safaribooks.py.

Problem

When taking a closer look at the original code causing the problem:

result.extend([c for c in response["results"] if "cover" in c["filename"] or "cover" in c["title"]])

The code above will append an item to the "result" list if the dictionary variable contains the word "cover" inside of "filename" or "title" key's value. I'm certain if you look at the chapters or sections that are incorrectly ordered, it will contain the word "cover" in the title or filename, so the problem here is that we can't use in operator because cover can be seen in words like dis𝘤𝘰𝘷𝘦𝘳y.

Workaround?

result.extend([c for c in response["results"] if "cover" == c["title"].lower() or "cover.xhtml" == c["filename"].lower() or "titlepage.xhtml" == c["filename"].lower()])

I've change the in operator to == which should do an exact match of the string and I've added titlepage.xhtml because there are books like API Security in Action (9781617296024) & The Art of Network Penetration Testing (9781617296826) that doesn't contain a cover.xhtml file, instead it's called titlepage.xhtml.

That sorted out the wrong order of chapters, but now there's an issue with the books mentioned above where the cover image from titlepage.xhtml isn't downloaded when using the workaround code above.

0Ky avatar Oct 30 '21 17:10 0Ky