python-dwca-reader icon indicating copy to clipboard operation
python-dwca-reader copied to clipboard

Headers consistency checks

Open niconoe opened this issue 5 years ago • 4 comments

André informed me of some archives (found in the wild) where there's an inconsistency between the CSV headers and the field list from the metafile.

Should we try to detect those and report the inconsistency?

niconoe avatar Sep 19 '19 14:09 niconoe

Yes. :-)

On Thu, Sep 19, 2019 at 11:15 AM Nicolas Noé [email protected] wrote:

André informed me of some archives (found in the wild) where there's an inconsistency between the CSV headers and the field list from the metafile.

Should we try to detect those and report the inconsistency?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/issues/79?email_source=notifications&email_token=AADQ7227N2S72BLR4J4Z3SDQKOCOTA5CNFSM4IYLXEEKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMNXNNQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ72ZXFPQRPFCRKAOHHT3QKOCOTANCNFSM4IYLXEEA .

tucotuco avatar Sep 19 '19 14:09 tucotuco

@andrejjh found another one...

@tucotuco, any opinion of how we should handle this? I'm thinking of just throwing an exception at the user's face, but if it's a common practice I might have complaints that python-dwca-reader is too strict. I can also add an option to disable the consistency check.

niconoe avatar Sep 20 '19 13:09 niconoe

Maybe with an option eg -check_headers

andrejjh avatar Sep 20 '19 13:09 andrejjh

What is the expected behaviour? If the metafile references a field that isn't in the data file at the position it says it should be, that to me should be an exception. If the data file has extra fields not mentioned by the meta file, that to me would be fine.

On Fri, Sep 20, 2019 at 10:31 AM André Heughebaert [email protected] wrote:

Maybe with an option eg -check_headers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/issues/79?email_source=notifications&email_token=AADQ727HAWAPT3A4UF7IFWLQKTGCLA5CNFSM4IYLXEEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GWNGI#issuecomment-533554841, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ72YMKHVCPNLGBCDE4QLQKTGCLANCNFSM4IYLXEEA .

tucotuco avatar Sep 20 '19 15:09 tucotuco