csv-validator
csv-validator copied to clipboard
IntegrityCheck error when folder called 'content' not in top folder, and additional file or folder at same level.
Running over some collections it picked up some erroneous IntegrityCheck errors. After manually verifying that there wasn’t actually any errors in the downloads I think I have traced it down to some residual issues with having a folder called ‘content’ somewhere in the filepath which isn’t the top level folder.
This was a problem before and was fixed in https://github.com/digital-preservation/csv-validator/commit/300ef07d20646b8638399b8cd41cac6608383005 . It seems there are still problems when you have a folder called content and a file or another folder at the same level. The substitution path in the fix fails and it seems to cause all directly related files above and below the content folder to fail integrityCheck. I’ve attached a small sample set which replicates this. csvvaltest.zip
@paulyoung84 The Integrity Check was always a mixed-concern :-/ It shouldn't really have been put into the CSV Validator, rather it should have been a separate tool entirely.
I think the first thing would be to define in writing somewhere what an "Integrity Check" really means, this might mean working backwards from the code. From there we could then figure out if that's what TNA needs today? Ultimately, I would still suggest moving it into a separate tool though.