iris
iris copied to clipboard
Common agreement on loading CF non-compliant NetCDF files
Iris needs a public statement on how it handles NetCDF files that deviate from the CF conventions. This will serve multiple benefits:
- More certainty when discussing if/how Iris should load a particular file.
- Clearer direction when developing the codebase.
- Set user expectations.
Writing this statement will involve making some difficult decisions. A working group is tackling this now: @tkknight, @bjlittle, @lbdreyer, @pp-mo, @trexfeathers, @stephenworsley, @ESadek-MO, @scottrobinson02, @HGWright
Factors at play
- More CF compliance means smoother collaboration between institutions, and Iris can play a part in raising awareness.
- CF evolves over time, so may develop 'opinions' on things that previously didn't matter and invalidate older files.
- The available tooling can make it difficult to address non-compliances in a file.
- UX - being strict/verbose about CF compliance makes the user experience more awkward.
- Iris has a place in the scientific Python community - people choose Iris / Xarray / raw netCDF4 / something else / for different purposes, and CF handling plays a part in that.
- Continuing to work in the face of CF non-compliances could need more defensive code.
Items affected
(please edit if you know of others)
- #5119
- #5126
- #5068
- #5067
- #5003
- #4495
- #1801
- #5171
- #4453
- #5257
### Tasks
- [ ] https://github.com/SciTools/iris/issues/5068
- [ ] https://github.com/SciTools/iris/issues/5119
Summary from working group conversations
2023-02-02
, 2023-02-14
, 2023-03-22
Note this issue is not intended as a debate, hence why it is not posted as a discussion. The below conversations took place in real time, with a group deliberately sized to aid decision making.
Outcome - our ideal implementation
When loading NetCDF files, Iris will load all CF-compliant elements. A container of non-compliant variables and attributes will be attached to the Cube
(s).
Encourage users:
If this causes you problems, please reach out to us to see if we can collaborate on a solution.
Implementation considerations
- How to contain things that can't be represented properly?
- Associate things with
Cube
s or isolated in own list? - Activate behaviour with a
FUTURE
flag?
Working group summary comments
- @trexfeathers: embrace imperfection, skipping non-compliances sounds good if warnings work.
- @stephenworsley: CF compliance is a good aim, but can't always be expected.
- @pp-mo: CF offers optional ways of doing things, Iris ought to do its best, but not insist. Discourage 'bad CF'.
- @bjlittle: KISS. Make users' lives simple, don't be awkward.
- @lbdreyer: we'll always break someone's workflow. Need a plan to help those who are left behind.
- @scottrobinson02: spirit of compromise. Accept that going in.
- @tkknight: KISS. Informative messages when things don't work.
- @HGWright: if we can do something we should do something. Don't throw toys from pram. Make our actions clear.
- @ESadek-MO: no easy solution, communicate well, focus on warnings.
Discussion topics
Encouraging compliance in the community
- We know examples where Iris' strictness has resulted in more compliant - more interoperable - files.
- CF is a convention, not a standard.
- CF is the only available convention and is therefore used for anyone looking for help making files interoperable.
- Iris' scope is wider than CF, and Iris doesn't implement all of CF. Need to avoid inventing our own rules.
- CF's longevity is relevant.
Files changing from acceptable to unacceptable
- While CF is intended to be backwards compatible, checks (within Iris, cf-checker, whatever) are not a complete implementation and may evolve over time, invalidating previously acceptable files.
Ease of massaging files to be compliant
- Always going to be somewhat difficult.
- If Iris can't cope with non-CF, then users forced onto another tool.
- Could edit the file directly using
ncedit
or NetCDF4, but this can be challenging, and editing a copy may be unrealistic. - All the rich tools (Iris, Xarray, cf-python) have their own opinions.
- ncdata has the potential to make this much easier.
- Could edit the file directly using
- Should Iris include a non-CF layer, lower than a
Cube
, to help with fixing?
User experience (UX)
- Cannot be underestimated.
- Undesirable to flatly refuse to load.
- Need clarity on what Iris expects.
- Need user education.
- Warnings are an opportunity to encourage compliance and help, without 'being awkward'
- Really important to not ruin UX with even more warnings.
- Classify warnings? Allowing users granularity for what the care about / ignore?
- CF brings some inevitable complexity, some user effort required.
- Compromises are necessary.
Iris' place in the world
- Interoperability allows using other, more tolerant tools.
- Learning/adopting other tools is nevertheless not as good as getting everything from one place.
- We should aim to avoid duplication within the geoscience community.
Ease of software development
- Defensive code takes extra effort.
- Iris could be written to work with things it doesn't explicitly understand.
- API changes could make things easier:
- Interchange between
Cube
and_DimensionalMetadata
. - Easier construction of
Cube
s from scratch.
- Interchange between
- Might be easier to include user-level fixing tools in Iris, rather than making Iris cope better.
Preferred approaches
Determined via voting.
- Iris only loads CF compliant parts of file, skipping non-compliant (maybe raises warning?).
- Iris allows the user to configure how it will interpret malformed file.
Oooh just discovered this issue via DragonTaming board @trexfeathers.
Sounds like you've got a fair bit of input from working group already; please shout though if useful to have more, as this is a particularly painful area for space weather - and we've got a good amount of requirements (ionosphere and lower) in the iris-o-sphere of traditional geographic lat/lon coords!
More context on why CF non-compliance an issue for space weather
Highly interested: space weather is not represented in CF conventions, so data wrangling is a key issue for us.
There's a few times where I've consciously decided not to go with iris due to anticipating "ugh, lots of pain handling I/O at boundaries due to data being inherently non-CF-compliant"
In retrospect, often this decision was bad:
- I've ended up writing (and then having to support!) custom code - e.g. pseudo-geo-aware dataclasses & methods for ionospheric data - which ends up being a poorer version of iris.
- I'd have been better served going for the real deal, and biting the boundaries-pain bullet.
Self-interestedly v happy to give more input if useful - help you help me!