tidy3d icon indicating copy to clipboard operation
tidy3d copied to clipboard

Validate that custom datasets can interpolate

Open marc-flex opened this issue 9 months ago • 9 comments

This PR addresses issue #1668

marc-flex avatar May 07 '24 09:05 marc-flex

All I've done so far is to introduce a validator that checks that data can be interpolated in each of the dimensions.

marc-flex avatar May 07 '24 09:05 marc-flex

Also, could you please add a few tests to tests/test_data/test_data_array.py? Basically try to replicate a few scenarios and edge cases for this behavior and ensure that the proper thing happens. Thanks

tylerflex avatar May 07 '24 12:05 tylerflex

Thanks @marc-flex I think this is looking pretty good, just a few minor tweaks and also we should see if we want to apply this validator to all DataArray objects (as written currently) or rather as a Simulation post-init validator that loops over sources and does the interp check. I'm curious to see if the front end tests pass with these changes or if there are instances where we explicitly allow duplicate coords in some data array objects.

You're right. This doesn't pass front-end test. I'll give it another go with all your comments and restricting the check to custom fields and sources.

marc-flex avatar May 07 '24 15:05 marc-flex

It might just be failing because of direction coords. but the fix I suggested about isel-ing the first coordinate might fix them, so perhaps try that first,

tylerflex avatar May 07 '24 16:05 tylerflex

Or it could just be that some of the front end test data arrays have extra coordinates (by accident)

tylerflex avatar May 07 '24 16:05 tylerflex

Thanks @marc-flex I think this is looking pretty good, just a few minor tweaks and also we should see if we want to apply this validator to all DataArray objects (as written currently) or rather as a Simulation post-init validator that loops over sources and does the interp check. I'm curious to see if the front end tests pass with these changes or if there are instances where we explicitly allow duplicate coords in some data array objects.

You're right. This doesn't pass front-end test. I'll give it another go with all your comments and restricting the check to custom fields and sources.

I guess also CustomMedium types with interp_method == "linear" (as opposed to the default "nearest"). Actually I don't know if "nearest" also fails in some cases, i.e. maybe the data array sel method also fails if there are repeated coordinates?

momchil-flex avatar May 07 '24 16:05 momchil-flex

@tylerflex I have tried several things. With the solution you mentioned (val.interp_like(val.isel({val.dim: 0}))) tests kept failing. So I have modified it to use an array of values from within the coordinates for each dimension (not sure that's generic enough).

Also, since this made checks fail, I'm only checking for custom source and mediums. Basically, I call a check function (defined at the DataArray level) within the validators.

I have created tests for custom medium and custom field source.

marc-flex avatar May 08 '24 11:05 marc-flex

Thanks @marc-flex just FYI, this PR #1681 changed some of the tests in pre/2.7 so it looks like you'll need to rebase against that branch and fix any conflicts that might come up.

I'm not sure which tests are failing but we can look into it more. One that probably fails is related to log_capture but you can see my comment above.

tylerflex avatar May 08 '24 12:05 tylerflex

Thanks @marc-flex just FYI, this PR #1681 changed some of the tests in pre/2.7 so it looks like you'll need to rebase against that branch and fix any conflicts that might come up.

I have just rebased against pre/2.7 and force-pushed it

marc-flex avatar May 08 '24 13:05 marc-flex

@marc-flex @momchil-flex is this something we want to get into 2.7.0rc2? and is it ready or still needing changes?

tylerflex avatar May 13 '24 15:05 tylerflex

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

You're opening Pandora's box here. Maybe we should do this for all custom-defined DataArrays fields? Or should we limit this PR to the fields defined in issue #1668 ? What do you think @momchil-flex @tylerflex ? I'm guessing this wouldn't just be the validator, it'd also be implementing a corresponding test

marc-flex avatar May 15 '24 09:05 marc-flex

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

You're opening Pandora's box here. Maybe we should do this for all custom-defined DataArrays fields? Or should we limit this PR to the fields defined in issue #1668 ? What do you think @momchil-flex @tylerflex ? I'm guessing this wouldn't just be the validator, it'd also be implementing a corresponding test

To me it kinda makes sense to test all custom-defined data array fields, and I can't think of currently existing objects where we wouldn't want that. However, a) I don't know if I'm forgetting something already and b) I'm not sure if in the future something will come up. But maybe let's leave b) for future devs to worry about if validating all custom data arrays passes all tests (so hopefully a) is ok)?

momchil-flex avatar May 20 '24 09:05 momchil-flex

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

You're opening Pandora's box here. Maybe we should do this for all custom-defined DataArrays fields? Or should we limit this PR to the fields defined in issue #1668 ? What do you think @momchil-flex @tylerflex ? I'm guessing this wouldn't just be the validator, it'd also be implementing a corresponding test

To me it kinda makes sense to test all custom-defined data array fields, and I can't think of currently existing objects where we wouldn't want that. However, a) I don't know if I'm forgetting something already and b) I'm not sure if in the future something will come up. But maybe let's leave b) for future devs to worry about if validating all custom data arrays passes all tests (so hopefully a) is ok)?

I can remove the specific tests for custom medium/source and have the validator at the DataArray level. The only issue is that it wouldn't then say what property made the validator fail which might be frustrating for the users. So I guess I'd suggest to keep the scope of this PR as is, and add another issue where we collect any other fields we'd like to validate/test?

marc-flex avatar May 22 '24 07:05 marc-flex

I agree with @marc-flex . Maybe it's best to just limit the scope of these validators to the components that are causing issues as per #1684 . because I additionally worry that adding too strict validation DataArray may lead to some unintended consequences.

tylerflex avatar May 22 '24 13:05 tylerflex

Ok yeah sounds good to me.

momchil-flex avatar May 22 '24 13:05 momchil-flex

@marc-flex I think this is basically ready to go. Last thing you'll need to do is update the docs/notebooks submodule to the latest version (pre/2.7 branch).

After you make that commit, we'll want to rebase this against tidy3d pre/2.7 and squash all the commits into one.

tylerflex avatar May 24 '24 15:05 tylerflex

@tylerflex @momchil-flex I have rebased and I thought everything would be OK from my side but it fails the "latest" test. Do let me know how to fix this.

marc-flex avatar May 27 '24 08:05 marc-flex

Actually the latest test is fine now, it's the formatting test that fails. :) You need to run bash scripts/test_local.sh (and you can abort after the formatting step), or

black tidy3d/
black tests/

momchil-flex avatar May 27 '24 12:05 momchil-flex

(make sure your black is the same as the version defined in pyproject.toml, or it will format most of the files in tidy3d)

tylerflex avatar May 27 '24 13:05 tylerflex

Not sure why poetry didn't complain about that when I committed the squash+rebase. I think it's all good now

marc-flex avatar May 27 '24 14:05 marc-flex