tidy3d
tidy3d copied to clipboard
Add metadata to Tidy3D components
Addresses #1352
How do you all feel about this?
- Used the name
.attrs
, because it follows the same convention asxarray
,pandas
, maybe others. -
attrs
are dictionaries (mutable) so they can be set on the fly,sim.attrs['foo'] = bar
. - we still can't set the attrs directly since tidy3d objects dont support item assignment, eg.
sim.attrs = {}
doesn't work. - in
xarray
, attrs remain when the objects are copied and updated. but are removed when the objects are modified. We only need to handle the copy part since we can't modify tidy3d objects. therefore ifsim.copy()
, then the new object's attrs are the same. - Important point, since
attrs
are mutable, we want to ensure our internal code doesn't depend on attrs in any way. Otherwise we introduce state. So as a rule, we don't useattrs
for any calculation? That being said, maybe we can set attrs internally if it makes sense to, but never code likeif obj.attrs["something"] == something: ...
All of the logic is spelled out in the test.
Also please check the description and see if it is clear.
Thanks
Just one question: it is possible to get an error if the user adds non-serializable data to the dictionary, because it will be included in the json representation of the model, right?
Yea it should be, good point. I'll add a test and also see what I can do about catching it.
- Important point, since
attrs
are mutable, we want to ensure our internal code doesn't depend on attrs in any way. Otherwise we introduce state. So as a rule, we don't useattrs
for any calculation? That being said, maybe we can set attrs internally if it makes sense to, but never code likeif obj.attrs["something"] == something: ...
This point sounds like going against the idea of using these for the jax_info
. Or would that be considered internal usage? Basically when submitting a JaxSimulation to the server, we split it into a Simulation and JaxInfo, and add the latter as attrs to the former.
One more question to bring up (@xin-flex) if two simulations only differ in their attrs
should they be considered the same or different in terms of caching?
This point sounds like going against the idea of using these for the jax_info. Or would that be considered internal usage? Basically when submitting a JaxSimulation to the server, we split it into a Simulation and JaxInfo, and add the latter as attrs to the former.
I think we could consider setting jax_info
as a pd.Field
in the JaxSimulation
instead of using attrs
maybe?
Hm, but then how do we upload this information? maybe that doesn't work after all.
@momchil-flex could you take a look into this PR and let me know if you need any other changes? I think I'll be doing the jax refactor stuff we discussed without attrs, although I'm still working that out.
The way this is currently, identical simulations with different attrs will not go through the caching mechanism, right? I think this may be fine, but I do remember maybe at some point discussing having the opposite behavior? It's probably harder to implement though. Any thoughts?
I'm not excactly sure how the caching functions. Do you know what it compares based of? In my mind, it probably makes most sense to ignore attrs when deciding whether to load a cached dataset but I don't know
The way this is currently, identical simulations with different attrs will not go through the caching mechanism, right? I think this may be fine, but I do remember maybe at some point discussing having the opposite behavior? It's probably harder to implement though. Any thoughts?
I'm not excactly sure how the caching functions. Do you know what it compares based of? In my mind, it probably makes most sense to ignore attrs when deciding whether to load a cached dataset but I don't know
I believe it compares the hdf5 file byte by byte, so it will take some engineering to modify this to exclude attrs.
But, in some use cases, that's probably safer. For example, the SimulationData
will contain the Simulation
that contains the attrs
. If caching kicks in, the returned SimulationData
will have different attrs
than the Simulation
that was submitted. To avoid this it would take even more engineering...
k, I feel comfortable taking the most conservative route and just caching based on attrs for now.. or until someone complains :)