openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

Different works all imported as one

Open seabelis opened this issue 10 months ago • 2 comments

Problem

We have many cases of different works being imported as editions of a single work. This is usually due to a common title, but different subtitles where the subtitle is ignored. In some cases the "title" is more some sort of code for the vendor or publisher and not the title. These are too time-consuming to split by hand.

Possibly related to #8977.

Relevant URL(s)

https://openlibrary.org/works/OL25289173W/Ebe?mode=all#editions-list

Notes from this Issue's Lead

Proposal & constraints

Please split each edition into a new work. In the url provided here, the "Ebe" should be dropped entirely and the subtitles used as titles. The removal of the "Ebe" should probably happen as a first step because when the editions are split into new works we don't want this included in the work title.

I will add more to this list.

Related files

Stakeholders

seabelis avatar Apr 08 '24 10:04 seabelis

  • [x] https://openlibrary.org/works/OL25131836W/Ebl The "Ebl" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25288751W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289002W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25288312W/Ebe_Amelia_Jane "Ebe : Amelia Jane" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289871W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289493W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289779W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289810W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.https://openlibrary.org/works/OL25289605W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289991W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25289462W/Ebe?mode=all#editions-list The "Ebe" should first be dropped from all titles.
  • [x] https://openlibrary.org/works/OL25288534W/Ebe_Aesop%27s_Fables_Retold?mode=all#editions-list "Ebe : Aesop's Fables Retold" should be dropped.
  • [x] https://openlibrary.org/works/OL25290020W/Ebe?mode=all#editions-list Drop "Ebe".
  • [x] https://openlibrary.org/works/OL25290051W/Ebe?mode=all#editions-list Drop "ebe."
  • [x] https://openlibrary.org/works/OL25289527W/Ebe_Famous_Five?mode=all#editions-list Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25289789W/Ebe_Famous_Five?mode=all#editions-list Drop "Ebe : Famous Five"
  • [x] https://openlibrary.org/works/OL25132590W/Ebe?mode=all#editions-list Drop "Ebe".
  • [x] https://openlibrary.org/works/OL25289829W/Ebe_Famous_Five?mode=all#editions-list Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25289691W/Ebe_Famous_Five?mode=all#editions-list Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25289935W/Ebe_Famous_Five?mode=all#editions-list Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25289870W/Ebe_Famous_Five?mode=all#editions-list Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25288949W/Ebe_Cameo_Plays Drop "Ebe : Cameo Plays"
  • [x] https://openlibrary.org/works/OL25290040W/Ebe_Famous_Five Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL32342783W/Learn_with_Noddy Drop "Learn with Noddy"
  • [x] https://openlibrary.org/works/OL25288754W/Ebe_Bible_Stories Drop "Ebe : Bible Stories"
  • [x] https://openlibrary.org/works/OL25289907W/Ebe_Famous_Five Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25289828W/Ebe_Mystery_Series Drop "Ebe : Mystery Series"
  • [x] https://openlibrary.org/works/OL25289609W Drop "Ebe : Mystery Series"
  • [x] https://openlibrary.org/works/OL25288479W/Ebe_Mary_Mouse Drop "Ebe : Mary Mouse"
  • [x] https://openlibrary.org/works/OL25288748W/Ebe_Enid_Blyton_Bible_Stories_Book_9 Drop "Ebe : Enid Blyton Bible Stories Book 9"
  • [x] https://openlibrary.org/works/OL25289985W/Ebe_Famous_Five_on_the_Case Drop "ebe"
  • [x] https://openlibrary.org/works/OL25289962W/Ebe_Mystery_Series Drop "ebe.
  • [x] https://openlibrary.org/works/OL25289534W/Ebe_Hanni_and_Nanni_Hanni_and_Nan_EBE_Hanni_and_Nanni Drop "Ebe : Hanni and Nanni : Hanni and Nan EBE : Hanni and Nanni"
  • [x] https://openlibrary.org/works/OL25289615W/Ebe_Famous_Five Drop "Ebe : Famous Five".
  • [x] https://openlibrary.org/works/OL25289257W/Ebe_Enid_Blyton_Bible_Stories_Book_4 Drop "Ebe : Enid Blyton Bible Stories Book 4"

seabelis avatar Apr 08 '24 10:04 seabelis

I think this is going to take more than just removing some prefixes to fix. I'd suggest deleting the records and pausing BWB imports until the quality improves. Some examples of problems, just looking at the first URL above, include:

  • German editions cataloged as English Screenshot 2024-04-09 at 11 17 00 AM

  • Titles truncated in the middle Screenshot 2024-04-09 at 11 19 45 AM

  • Titles with undecoded HTML entities Screenshot 2024-04-09 at 11 18 14 AM

  • Publication dates a decade and a half in the future

These works have also already been extensively cataloged, so these imports aren't adding any useful information. https://openlibrary.org/search?q=title%3A+%22hanni+und+nanni%22&mode=everything

tfmorris avatar Apr 09 '24 15:04 tfmorris

I believe I have cleaned up or removed all of the remaining Ebe / Eb / 2039 Enid Blyton records. Most of what I did was removal of no-value (undifferentiated / junk metadata) records that had already been correctly split or salvaged by others.

Thanks @seabelis and @tfmorris, (and probably others too) who have had to wade through this previously. I see you have both done lots of work making the main Enid Blyton collection clean, and these issue reports have helped get the last of the dregs. These ISBN parking future editions or whatever they were should never have been imported in the first place. I believe current import rules will prevent this sort of thing happening again.

hornc avatar Jun 19 '24 00:06 hornc