openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

Add more tests for import validator

Open scottbarnes opened this issue 1 year ago • 2 comments

Closes #9440

Adds more tests to the import validator and remove publishers as a required field for a complete record.

How is this looking for a first attempt?

Technical

Ideally this makes the requirements more explicit. It is important to note this also removes publishers as required field for a complete record, to hopefully hew more closely to #9440.

Testing

Run the tests.

Screenshot

Stakeholders

@hornc

scottbarnes avatar Aug 15 '24 04:08 scottbarnes

@scottbarnes thanks for continuing with this!

It would be nice to have some clear, specific and labelled import cases that match the various kinds of import cases that have prompted this issue. There seem to be enough sublties to how the code should behave, I just want to see a selection of semi-realistic import dictionaries with the required fields in place, and see whether the case is accepted or rejected, without having to follow throguh which fields are copied from which other record.

Testing everything field by field loses the overall sense of what this code is trying to do at a higher level -- accept or reject various kinds of imports that come through the system.

For the last PR I literally downloaded the changes and wrote my own tests to check whether the basic realistic cases we might expect from a Promise item behaved as expected, and still got it wrong because I missed that source_records must be included at this point, that's not part of the source promise item data, but is added by the import process at some point.

Minimal differentiatable record:

1. ISBN case:

title:
isbn:
source_records:

2. LCCN case:

title:
lccn:
source_records

Minimal differentiatable non-id record

title:
authors:
publishers:
publish_date: 1923
source_records:

3.5) we'd perefer a publisher ... I don't the the result would differ if we added one though

Too minimal records (REJECT):

  1. Missing author -- is this what we expect?

title:
publish_date: 1923
publishers:
source_records:

title + author, but no date to identify an edition

title:
author:
publishesr:
source_records:

Other items of interest for sanity checking would be:

title:  
authors:  
publish_date: 01-01-1990
source_records:
title:  
authors:  
publish_date: '????'
source_records:

and possibly variants with some of the bad bookseller data in the wrong fields, perhaps author name 'Unknown' or 'n/a' --

I know we have special code somewhere that checks for '01-01-' dates, I'm not sure where and when that comes into effect, or exactly how it is going to interact with these changes. Other non-numeric date strings should probably be tested to ensure we are not passing date validation from any garbage data.

I don't think it hurts to write out the explicit test cases here, having a range of realistic values in the examples will test more, the current tests are testing "December 2018" as a date many times, and its only one of a range of possible date formats we get, and I'm not even sure it's a common one.

hornc avatar Aug 15 '24 06:08 hornc

@hornc, I added a new commit, c55bb6afd7a182d5975973cf2f34eaad758fb783, that is hopefully responsive to the request to make the tests more explicit not just in terms of how they're named, but how they're laid out.

I also attempted to test more values, and to do some sanity checking. This has caused, for the moment, some overlap with existing code called later in load(), but if this approach looks promising, I can see how to DRY it up (after determining where load() is called directly and seeing whether that can be reasonably changed).

If this looks good, I can continue to add more variations on input data for each field, and also try to add more sanity checking. Any suggestions on things to check for (e.g. author names of Unknown, etc.) are always welcome.

scottbarnes avatar Aug 29 '24 04:08 scottbarnes

@hornc, I just wanted to follow up to see if this latest commit is headed in the direction you envisioned.

scottbarnes avatar Dec 02 '24 21:12 scottbarnes

I went ahead and merged this, just so we can have tests to work off of as we make these requirements less stringent, per #10157.

scottbarnes avatar Dec 30 '24 16:12 scottbarnes