Improve `repr`s for key Parcels classes
So far, we haven't really worked on the output of print() on objects like Field, FieldSet etc. While there is a ParticleSet.__repr__, even that is pretty clunky and doesn't take into account notebook functionality.
It would be nice to have better print() implementation for at least the following classes
- [ ]
Field - [ ]
FieldSet - [ ]
ParticleSet - [ ]
Particle - [ ]
ParticleFile - [ ]
Kernel(?)
Hi, I’d like to contribute to this issue. Could you please specify how you would like the print() implementation to be improved for the Field, FieldSet, and ParticleSet classes?
Thanks for offering to contribute to this Issue, @rehamansoor! I don't have a very clear idea what to print in mind yet, but was inspired by the html-version __repr__ in xarray (see e.g. https://github.com/pydata/xarray/issues/1627). So a table-like/collapsable view that shows the attributes of these objects that are relevant to users
Since a FieldSet is a collection of Fields, I suggest we start with a Field.__repr__? This should include information on the name, grid, dimension sizes, interpolation_method etc.
@VeckoTheGecko, since you are working on redefining the API, perhaps you also have ideas what attributes to show in a notebook?
Hi @rehamansoor, great to hear that you’re interested in this issue.
Before the wall of text that is about to follow, please let me know if you have any questions about anything below, or if you'd like any clarification. I'm not sure what your Python level is so - if you feel like this is a bit much for where you're at - let me know (there are issues in the codebase that are better suited to new Python developers 😁). I imagine these won't all be fixed in one PR, or by the same person.
Useful reference
TLDR: I think we shouldn't do HTML outputs for now. Let's focus on text reprs
HTML repr outputs would be amazing in theory, but after doing some diving into the implementation in xarray (see entrypoint ds._repr_html_() for the curious) their implementation is complex requiring custom CSS (vendored in the package itself in a static folder), HTML generation (including escaping characters to produce valid HTML), and they even use custom decorators to drive the reprs.
import xarray as xr
import numpy as np
import pandas as pd
ds = xr.Dataset(
data_vars=dict(
temperature=(["loc", "time"], np.random.randn(2, 4)),
precipitation=(["loc", "time"], 10 * np.random.rand(2, 4)),
),
coords=dict(
lat=("loc", [42.25, 42.21]),
time=pd.date_range("2014-09-06", periods=4),
reference_time=pd.Timestamp("2014-09-05"),
),
attrs=dict(description="Weather related data."),
)
print(ds._repr_html_())
I think there is quite some room for improvement in our text reprs (we'll need them anyway for scripts) and down the line see if there is a need for HTML ones as well (/investigate how we can better customise iPython notebooks output). At the moment the apparent complexity/maintenance burden makes me hesitate. Note customising iPython notebook reprs is easily done with the _repr_html_() method.
Starting with the smallest objects to the largest, this is the information (self.values) that would be good to have in the repr for each:
-
[ ]
Field-
self.information to include-
grid -
mesh -
allow_time_extrapolation -
time_periodic -
gridindexingtype -
to_write
-
- notes: pretty formatting with indentation
-
-
[ ]
VectorField-
self.information to include-
name -
U -
V -
W
-
- notes: pretty formatting with indentation
-
-
[ ]
FieldSet- information to include
- Fields (call
.get_fields()). To be formatted as a list going down
- Fields (call
- information to include
-
[ ]
ParticleFile-
self.information to include- name
- outputdt
- chunks
- create_new_zarrfile
- (intentionally left out
particleset)
- notes: have a simple oneline repr (e.g.,
ParticleFile(name='my_name', outputdt='...)
-
-
[ ]
ParticleSet-
self.information to include- fieldset
- pclass
- repeatdt
- Other information to include:
- number of particles: do
len(self) - particle information (only first x and last y particles):
- Currently the
__repr__print out all particles, which isn’t really useful in the case of thousands of particles (which occurs frequently). Current implementation isreturn "\n".join([str(p) for p in self])which can be adapted accordingly. Choose sensible defaults forxandy
- Currently the
- number of particles: do
- notes: pretty formatting with indentation
-
-
[ ]
Kernel- Hmmmm, not sure for the timebeing. No changes needed
-
[ ]
ParticleandVariable- @erikvansebille I saw some reprs of the format
f"PType<{self.name}>::{self.variables}". Does Parcels dynamically evaluate these in the C-code? Are these worth avoiding changes to?
- @erikvansebille I saw some reprs of the format
I would recommend working on the simple reprs first. Also at this stage we are not too concerned about user customising the repr output (like the set_options() class in xarray options.py allowing to set max_width), but having a sensible output of 80 should be good. Please rely on formatting helper functions to reduce code duplication, you can put them in tools/_formatting.py.
This isn't the be-all and end-all for the reprs. We'll be reworking Parcels internals over the coming months cleaning things up. But having a starting point to go from would be super helpfull.
If there are numpy arrays in the repr, please use the context manager with np.printoptions(threshold=5, suppress=True, linewidth=120): or some other sensible defaults (see docs) so that the output is managable.
Thank you for the detailed breakdown, and I apologize for the delay in replying—I got a bit busy with school. I really appreciate the opportunity to contribute. While I have experience in Python through my college courses, I think these tasks might be a bit complex for me at the moment. Could you suggest an issue that might be a better fit for my current skill level? I’m eager to help and continue improving my abilities.
Thanks again, and I look forward to contributing!
Hi @rehamansoor . All good! We do have a few issues that would be better suited:
In #1620 with have the subtask Remove erroneous calls to os.path in cleanup_remove_files() and clean up function. This is a purely Pythonic task and, once you understand that section of code, should be quite straightforward. We also have #1701 which is just documenting adding metadata to the particlefile output (for example an entry "contact_email": "[email protected]") . There is also #1511 which is more specific to Parcels itself
- @erikvansebille I saw some reprs of the format
f"PType<{self.name}>::{self.variables}". Does Parcels dynamically evaluate these in the C-code? Are these worth avoiding changes to?
bump