openlibrary
openlibrary copied to clipboard
Update Import API to strip `-` from ISBNs
Problem
I ran this code:
await fetch("/api/import", {
method: "POST",
body: JSON.stringify({
identifiers: { isbn_13: ["978-3-943075-58-8"] },
title: "UX Design Process",
languages: ["eng"],
publishers: ["Smashing Media AG"],
publish_date: "January 2013",
cover:
"https://archive.smashing.media/assets/344dbf88-fdf9-42bb-adb4-46f01eedd629/1ccfff60-c629-4973-bf5c-2997266b2e5c/ux-design-process-cover-large-opt.png",
links: [
{
url: "https://www.smashingmagazine.com/ebooks/ux-design-process/",
title: "homepage",
},
],
physical_format: "ePUB, Kindle, PDF",
authors: [{ name: "Smashing Media AG" }],
number_of_pages: "88",
source_records: ["raybb_web_scraping"],
}),
});
However, when I looked at the book it had the ISBN set to "978-3-943075-58-8" and when I visited https://openlibrary.org/isbn/978-3-943075-58-8 it didn't show that the book was available. So then I went and edited the book to add the ISBN again and it auto stripped the -
and it started showing up at that endpoint.
So, I think we should probably strip the ISBNs before saving from the import endpoint.
Edit: I now see that even when I import without the -
it still doesn't show up until after I edit.
Evidence / Screenshot
Relevant URL(s)
Reproducing the bug
- Go to ...
- Do ...
- Expected behavior:
- Actual behavior:
Context
No response
Notes from this Issue's Lead
Proposal & constraints
Related files
Stakeholders
@RayBB, this one is puzzling me a bit. Are you able to reproduce this on the current master
branch in the local development environment?
I just tried:
❯ curl -X POST \
-H "Content-Type: application/json" \
-b ~/cookies.txt \
-d '{
"title": "UX Design Process",
"source_records": ["smashingmagazine:isbn_here_but_omitted_to_ensure_test_only_looks_at_isbn_field"],
"authors": [{"name": "Smashing Media AG"}],
"publishers": ["Smashing Media AG"],
"isbn_13": ["978-3-943075-58-8"],
"publish_date": "January 2013",
"number_of_pages": 88
}' \
http://localhost:8080/api/import
{"authors": [{"key": "/authors/OL15A", "name": "Smashing Media AG", "status": "created"}], "success": true, "edition": {"key": "/books/OL28M", "status": "created"}, "work": {"key": "/works/OL20W", "status": "created"}}
I could then visit http://localhost:8080/isbn/978-3-943075-58-8 and I was forwarded to http://localhost:8080/books/OL28M/UX_Design_Process, which had "isbn_13": ["9783943075588"]
in the JSON.
I looked at the code for /api/import
, and it looks as if it has been running canonical()
from isbnlib
for quite some time.
On the day this issue was opened there was a commit (https://github.com/internetarchive/openlibrary/commit/5f7d8d190e2f0d837545e582fd5db99aae51a979) that changed some of the code for /isbn
, but canonical()
was already running on the ISBN input there too.
I have noticed sometimes it can take a little bit for the backend to 'catch up' when a new edition is added, in terms of the search. If you can't replicate this on the local environment, do you think this 'lag' could have been part of it, and maybe the editing and then having the book forward was a coincidence?
@scottbarnes I didn't try with curl (I don't have the cookies file) But I did run the fetch in my main post on local in the browser and:
- The isbn url still 404s (http://localhost:8080/isbn/978-3-943075-58-8)
- The work page has the isbn with the
-
in it still
However, this really doesn't seem like a big issue because on prod is it probably due to some lag in processing and/or cache.
@RayBB, I keep coming back to this, confused, but I think I finally understand what's happening. isbn_13
is its own field and doesn't go under identifiers
when importing, even though when editing a book in the UI ISBN 10
and ISBN 13
are found under ID Numbers
, which in many cases updates identifiers
.
In the edition schema, one can see this a bit more clearly.
I think this can probably be closed now, but I want to double check with you.
@scottbarnes I think the question now is do we expect that field to be stripped of -
? In the edition edit UI it auto strips them and I suppose that was for a reason.
However, my original problem is resolved so if the above isn't wroth looking into then yes lets close it!
We may be talking past each other slightly, so I will step back a bit, but in doing so I may just be repeating things you already know, and I realize I may simply be misunderstanding the question, so my apologies if that's the case.
I think the hyphens weren't being stripped from the ISBN in this particular import because the ISBN was included under identifiers
, with identifiers: { isbn_13: ["978-3-943075-58-8"] }
; but the schema expects isbn_10
and isbn_13
to be a separate field from identifiers
, such as isbn_13: ["978-3-943075-58-8"]
, even though the ISBNs display under the "ID Numbers" section of the UI.
When you edited the ISBN for UX Design, did you do anything to cause the ISBN to be moved to the correct field, or did that happen automatically? https://openlibrary.org/books/OL51673812M/UX_Design_Process?_compare=Compare&b=2&a=1&m=diff.
Though importing ISBNs under the identifiers
field will display correctly in the UI, the logic of the site expects the ISBNs to be under isbn_10
and isbn_13
and not under identifiers
.
When you edited the ISBN for UX Design, did you do anything to cause the ISBN to be moved to the correct field, or did that happen automatically?
I don't recall for certain. If anything I just removed the identifier and added it back right away (because the client side strips the -
when you re-add it).
Interesting that it expects it to be a separate field and not under identifiers. I guess the root of this is that my import statement wasn't following best practices.
I will go ahead and close this, then. I do agree it's curious that ISBNs, which are definitely identifiers, are not included in identifiers
. With the benefit of hindsight, I suspect this might have been done differently.