hepdata icon indicating copy to clipboard operation
hepdata copied to clipboard

global: clean up duplicate table DOIs in production instance

Open GraemeWatt opened this issue 2 months ago • 0 comments

When reindexing the QA instance after deploying PR #766 some of the records gave an exception:

sqlalchemy.exc.MultipleResultsFound: Multiple rows were found when exactly one was required

from the line:

https://github.com/HEPData/hepdata/blob/21ed04434dd4351b76f7542974c2a9ca99f5c645/hepdata/ext/opensearch/document_enhancers.py#L153

I just changed this line in commit 319ff152009d966f7e60f2e1721e9332805e38df to make it tolerate multiple results. However, it should be investigated in more detail why there are multiple DataSubmission objects with the same doi. I found 6 examples:

  • https://www.hepdata.net/record/78551 (10.17182/hepdata.78551.v1/t3 appears twice)
  • https://www.hepdata.net/record/77606 (10.17182/hepdata.77606.v1/t54 appears twice)
  • https://www.hepdata.net/record/78402 (10.17182/hepdata.78402.v1/t29 appears twice)
  • https://www.hepdata.net/record/80608 (10.17182/hepdata.80608.v1/t14 appears twice)
  • https://www.hepdata.net/record/77761 (10.17182/hepdata.77761.v1/t3 appears twice)
  • https://www.hepdata.net/record/76842 (10.17182/hepdata.76842.v1/t3 appears twice)

These all date from the early days of hepdata.net in 2017/2018 when the submission code was buggy and the procedure for replacing uploads was not done cleanly. It should be investigated how to clean up the database to remove the duplicate DOIs.

GraemeWatt avatar May 02 '24 15:05 GraemeWatt