Automatic ISBN Detection from Book Files
What problem or limitation are you encountering?
Currently, I have to manually find and input the ISBN for each book I add to Booklore. This is time-consuming, especially when adding many books.
What solution or improvement do you propose?
I propose a feature where Booklore can automatically detect the ISBN from the book file itself (e.g., by scanning the cover/first pages for text). Once detected, it should automatically populate the ISBN field for the book.
Have you found any workarounds or alternatives?
The current workaround is to manually search for the ISBN online or within the book file and then copy-paste it into Booklore.
Additional details
This would greatly streamline the book addition process and improve data accuracy by reducing manual entry errors. It would be particularly useful for large libraries.
I've been tackling a similar issue with one of my own projects. The isbn provided by metadata providers doesn't always match up with the version/edition of your actual book.
Various isbns can often be found on the copyright page but not all books ebooks contain this page. the next place that could be looked for is embedded within an epub itself although this might suffer the same issue above.
Let me strongly agree by putting it this way...
Why guess what a book is when the exact release is right there for the parsing.
I use this Calibre addon and frankly it makes such a radical difference to matching I now cant live without it
I've been tackling a similar issue with one of my own projects. The isbn provided by metadata providers doesn't always match up with the version/edition of your actual book.
This statement is spot on. This obviously means that the ISBN provided by the embedded ebook metadata is also frequently incorrect as best-guess tooling may have been used somewhere in the supply chain for a close, but ultimately, incorrect match.
Downsides:
- It is not always correct. Typically caused by choosing the wrong ISBN when more than one is embedded or when the only ISBN is an advert e.g. an audio book link
- Sometimes multiple ISBNs are included e.g. citations and series lists
- Its slow. Scanning potentially thousands of pages of text may need to be an OPT-IN feature (I have ideas how to speed this up and make it low cost, perhaps even fast enough to be a native )
Its slow. Scanning potentially thousands of pages of text may need to be an OPT-IN feature (I have ideas how to speed this up and make it low cost, perhaps even fast enough to be a native)
Generally there is no need to scan all pages. The ISBN is most often on the first or last five pages.
Depending on how things are structured you can look for the copyright page (not copy notice page!) with is usually in the front or back few pages, and extract the isbns.
I try to go for eisbn or any digital version when listed.
Iirc when I initially set up booklore I used a plugin for ripgrep that allowed me to search through archives (epubs here) and extract isbns
Usually when multiple ISBNs are provided, they're marked with release type, like so:
Library of Congress Cataloging-in-Publication Data
Names: Sobel, Dava.
Title: The glass universe : how the ladies of the Harvard Observatory took the measure of the stars / Dava Sobel.
Description: New York : Viking, 2016. | Includes bibliographical references and index.
Identifiers: LCCN 2016029496 (print) | LCCN 2016030208 (e-book) | ISBN 9780670016952 (hardcover) | ISBN 9780698148697 (e-book)
Subjects: LCSH: Women in astronomy—Massachusetts—History. | Women mathematicians—Massachusetts—History. | Astronomy—History—19th
century. | Astronomy—History—20th century. | Harvard College Observatory.
Classification: LCC QB34.5 .S63 2016 (print) | LCC QB34.5 (ebook) | DDC
522/.19744409252—dc23
LC record available at https://lccn.loc.gov/2016029496
Printed in the United States of America
or
Published in Australia and New Zealand in 2015
by Hachette Australia
(an imprint of Hachette Australia Pty Limited)
Level 17, 207 Kent Street, Sydney NSW 2000
[www.hachette.com.au](http://www.hachette.com.au/)
Copyright © Mark Hunt 2015
This book is copyright. Apart from any fair dealing for the purposes of private study, research, criticism or review permitted under the Copyright Act 1968, no part may be stored or reproduced by any process without prior written permission. Enquiries should be made to the publisher.
A CIP catalogue record of this book is available from the National Library of Australia.
978 0 7336 3462 8
978 0 7336 3461 1 (ebook edition)
Cover design by Christabella Designs
Cover images courtesy of Getty Images
or
The Library of Congress has catalogued the hardcover edition as follows:
Erikson, Steven.
Gardens of the moon / Steven Erikson.
p. cm.
“A Tom Doherty Associates Book.”
ISBN-13: 978-0-7653-1001-9
ISBN-10: 0-7653-1001-5
1. Fantasy fiction. I. Title.
PR9199.4.E745 G37 2004
823’.92—dc22
2004301319
ISBN-13: 978-0-7653-2288-3 (trade paperback)
ISBN-10: 0-7653-2288-9 (trade paperback)
0 9 8 7 6 5 4 3 2
eISBN 9781429926584
So, in my view, the logic should be to search for all hits that match ISBN format (including dashes, spaces, or no section delimiters) within the first or last 10% of the book, and then, if there is more than one, use some algorithm to select the right ebook ISBN?
This seems sane to me but I would suggest in the first instance we match only if one ISBN exists in total (ISBN-10 and ISBN-13 being compared where necessary i.e. one ISBN but two formats).
The time to implement this would be much shorter allowing it to get some user-base experience and feedback before introducing release decision logic which best case wont ever be 100%.
I do not want to take away from these suggestions, rather the opposite in that I think this feature will be popular even in its most basic single match form and the sooner its added the quicker that popularity momentum starts.