openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

Web based Add Book creates malformed entries if there's no author

Open tfmorris opened this issue 6 years ago • 2 comments

Related #2116 https://github.com/internetarchive/openlibrary-client/issues/126

When a book is entered through the web UI and the user doesn't provide an author, the author entry is incorrectly formatted as described in the two related issues.

Evidence

See the increasing counts and list of newly created works at https://github.com/internetarchive/openlibrary-client/issues/126

Steps to Reproduce

I didn't attempt to reproduce, but the empirical evidence is that an additional 4,000 of these entries were created this year.

  • Actual: record have an author object with a type, but no value
  • Expected: authors should just be an empty list instead.

tfmorris avatar Dec 03 '19 23:12 tfmorris

Since the original dataset ended in Aug 2019, I realized that there was a small chance the bug had been fixed in the intervening months, so I reran the analysis with the Nov 2019 dump. Here are some recent bad entries:

2019-11-30T18:01:33.177279	/works/OL20503723W
2019-11-29T05:00:48.115709	/works/OL20501278W
2019-11-29T02:37:49.419152	/works/OL20501231W
2019-11-27T21:49:25.668036	/works/OL20499616W
2019-11-27T19:31:31.716786	/works/OL20498934W
2019-11-27T19:31:24.291327	/works/OL20498931W
2019-11-27T19:31:22.097877	/works/OL20498930W
2019-11-27T19:31:21.283416	/works/OL20498929W
2019-11-27T19:27:06.373220	/works/OL20498902W
2019-11-27T19:27:04.043480	/works/OL20498901W

The most recent work has no associated editions and was created by @JeffKaplan, so perhaps he can provide more insight into what screen/mechanism was used.

tfmorris avatar Dec 04 '19 06:12 tfmorris

@tfmorris The first of those was left childless after https://openlibrary.org/books/OL27737159M/Reasoning_through_Romans_Part_1?b=2&a=1&_compare=Compare&m=diff merged its sole edition to another work record. Patrons have no ready way to redirect these childless works to the merged work. It should just happen automagically.

The second was a new user with the same username as the publisher stated. The title seems to be a mangled version of Ưu điểm của két sắt khách sạn ("Advantages of hotel safes" in Vietnamese). Gives no author, identifier (except OL numbers) or other details. Pretty clearly spam/vandalism

The third was another new user adding a Create Space edition. Gives no author, identifier (except OL numbers) or other details. Pretty clearly spam again.

The real thing to address here is why the edit interface allows creation of works without authors or for that matter other identifiers too.

LeadSongDog avatar Dec 04 '19 17:12 LeadSongDog