tidy3d icon indicating copy to clipboard operation
tidy3d copied to clipboard

Add metadata to Tidy3D components

Open tylerflex opened this issue 1 year ago • 4 comments

Addresses #1352

How do you all feel about this?

  1. Used the name .attrs, because it follows the same convention as xarray, pandas, maybe others.
  2. attrs are dictionaries (mutable) so they can be set on the fly, sim.attrs['foo'] = bar.
  3. we still can't set the attrs directly since tidy3d objects dont support item assignment, eg.sim.attrs = {} doesn't work.
  4. in xarray, attrs remain when the objects are copied and updated. but are removed when the objects are modified. We only need to handle the copy part since we can't modify tidy3d objects. therefore if sim.copy(), then the new object's attrs are the same.
  5. Important point, since attrs are mutable, we want to ensure our internal code doesn't depend on attrs in any way. Otherwise we introduce state. So as a rule, we don't use attrs for any calculation? That being said, maybe we can set attrs internally if it makes sense to, but never code like if obj.attrs["something"] == something: ...

All of the logic is spelled out in the test.

Also please check the description and see if it is clear.

Thanks

tylerflex avatar Feb 23 '24 14:02 tylerflex

Just one question: it is possible to get an error if the user adds non-serializable data to the dictionary, because it will be included in the json representation of the model, right?

Yea it should be, good point. I'll add a test and also see what I can do about catching it.

tylerflex avatar Feb 23 '24 16:02 tylerflex

  1. Important point, since attrs are mutable, we want to ensure our internal code doesn't depend on attrs in any way. Otherwise we introduce state. So as a rule, we don't use attrs for any calculation? That being said, maybe we can set attrs internally if it makes sense to, but never code like if obj.attrs["something"] == something: ...

This point sounds like going against the idea of using these for the jax_info. Or would that be considered internal usage? Basically when submitting a JaxSimulation to the server, we split it into a Simulation and JaxInfo, and add the latter as attrs to the former.

One more question to bring up (@xin-flex) if two simulations only differ in their attrs should they be considered the same or different in terms of caching?

momchil-flex avatar Feb 27 '24 22:02 momchil-flex

This point sounds like going against the idea of using these for the jax_info. Or would that be considered internal usage? Basically when submitting a JaxSimulation to the server, we split it into a Simulation and JaxInfo, and add the latter as attrs to the former.

I think we could consider setting jax_info as a pd.Field in the JaxSimulation instead of using attrs maybe?

tylerflex avatar Feb 27 '24 22:02 tylerflex

Hm, but then how do we upload this information? maybe that doesn't work after all.

tylerflex avatar Feb 27 '24 22:02 tylerflex

@momchil-flex could you take a look into this PR and let me know if you need any other changes? I think I'll be doing the jax refactor stuff we discussed without attrs, although I'm still working that out.

tylerflex avatar Apr 10 '24 02:04 tylerflex

The way this is currently, identical simulations with different attrs will not go through the caching mechanism, right? I think this may be fine, but I do remember maybe at some point discussing having the opposite behavior? It's probably harder to implement though. Any thoughts?

I'm not excactly sure how the caching functions. Do you know what it compares based of? In my mind, it probably makes most sense to ignore attrs when deciding whether to load a cached dataset but I don't know

tylerflex avatar May 01 '24 17:05 tylerflex

The way this is currently, identical simulations with different attrs will not go through the caching mechanism, right? I think this may be fine, but I do remember maybe at some point discussing having the opposite behavior? It's probably harder to implement though. Any thoughts?

I'm not excactly sure how the caching functions. Do you know what it compares based of? In my mind, it probably makes most sense to ignore attrs when deciding whether to load a cached dataset but I don't know

I believe it compares the hdf5 file byte by byte, so it will take some engineering to modify this to exclude attrs.

But, in some use cases, that's probably safer. For example, the SimulationData will contain the Simulation that contains the attrs. If caching kicks in, the returned SimulationData will have different attrs than the Simulation that was submitted. To avoid this it would take even more engineering...

momchil-flex avatar May 01 '24 17:05 momchil-flex

k, I feel comfortable taking the most conservative route and just caching based on attrs for now.. or until someone complains :)

tylerflex avatar May 01 '24 17:05 tylerflex