Fixed strange behaviour when creating variables with conflicting coordinates
Closes #450
Previous logic
For as_data_array when coords is provided:
- If array is constant, broadcast to coords
- If array is pandas or xarray, ignore coords and use coords of array
- If array matches coords, then keep coords
For a more concrete example, if you provide 2 dimensional coords, this is the result of different input data:
- 0 dimensions -> 2 dimensions
- 1 dimension -> 1 dimension
- 2 dimensions -> 2 dimensions
The behavior of 1 dimension -> 1 dimension is clearly the odd one out and is not very intuitive.
This behavior was noticed by people using m.create_variable which calls as_data_array under the hood
Changes proposed in this Pull Request
Add the force_broadcast option to as_data_array . When true, it will always try to broadcast to the dimensions implied by coords.
For the example above this means
- 0 dimensions -> 2 dimensions
- 1 dimension -> 2 dimensions
- 2 dimensions -> 2 dimensions
Use this new option inside m.add_variable so that variable creation is more intuitive.
Note that this is a breaking change for anyone who was relying on the previous behavior when creating variables (I.e they were depending on the coords argument to be ignored). This seems like quite an edge case so perhaps it's OK to change directly.
Checklist
- [X] Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in
doc. - [X] Unit tests for new features were added (if applicable).
- [X] A note for the release notes
doc/release_notes.rstof the upcoming release is included. - [X] I consent to the release of this PR's code under the MIT license.
thanks @RobbieKiwi! I have to look into this a bit more and we need to be careful here as it could lead to unexpected behavior. and yes I am not happy with the API in terms of coords alignment, it is just too vague, we need a strict convention here. We also need to think about different operations
- var + var
- coeff * var
- const + coeff * var
ideally they all follow the same convention but they are currently not. for example there is the logic that coords of a secondary term in operations like c1 * v1 + c2 * v2 are ignored if the both c1 * v1 and c2 * v2 have the same shape. On top we have the stuff of indexed and non-indexed constants / coeffs
Hey thanks for having a look and let me know if you have any ideas of how to progress with this