echopype
echopype copied to clipboard
Drop Ping Time Duplicates
Addresses the duplicate ping time part of #1374
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 80.66%. Comparing base (
9f56124) to head (a466b26). Report is 164 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1382 +/- ##
==========================================
- Coverage 83.52% 80.66% -2.86%
==========================================
Files 64 19 -45
Lines 5686 3108 -2578
==========================================
- Hits 4749 2507 -2242
+ Misses 937 601 -336
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 80.66% <100.00%> (-2.86%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@leewujung This should be ready for review
@ctuguinay: I just got back to this and saw my previous comments, which still stands. We can merge this once those are addressed. Thanks!
@leewujung So I did a quick check of the first two duplicated ping time pairs to see if .drop_duplicates(dim='ping_time') was dropping only ping times that contained duplicate data and it turns out that that was the case:
And here's a more generalized check that shows that subsequent ping time pairs are all equal:
Thanks for looking into this. I agree this would address the problem, but one thing I think we should check is if the entries with duplicated ping_time actually contain the same data. If not, the duplication of ping_time does not make sense, but dropping it would also mean that we lose data?
And I was thinking about this again, and I think we would either need to drop duplicate ping times even if they contained unique data or we would need to apply some tiny time offset to them such that they did not have the same duplicate ping times, since xr.concat in set_sonar() will fail if there are any duplicates in the datasets were are concatenating.
I don't have a .raw file where I can test this tiny time offset idea on since I don't I have seen a dataset where the ping time duplicates contain unique data.
And I was thinking about this again, and I think we would either need to drop duplicate ping times even if they contained unique data or we would need to apply some tiny time offset to them such that they did not have the same duplicate ping times, since
xr.concatinset_sonar()will fail if there are any duplicates in the datasets were are concatenating.I don't have a .raw file where I can test this tiny time offset idea on since I don't I have seen a dataset where the ping time duplicates contain unique data.
I agree. In the grand scale of things, I kinda feel that dropping one ping is not a big deal, but perhaps that goes against the 0% lossless idea for the raw-converted data.
Perhaps what we can do is to drop duplicated ping_time regardless, but spit out a warning to user if the dropped pings contain different data compared to the retained ping?
I'll shoot an email to Simrad rep to see if he has thoughts about why the pings are duplicated in the first place, and what he thinks about your idea to add a tiny time offset. Will cc you!
I took a look at the file you used above, and found out that in fact ALL PINGS are duplicated (!), from the dictionary fields low_date/high_date in ParseEK80:
The data appear to be completely duplicated as well.
Maybe something was happening here with the instrument?
@ctuguinay How many files in the set you were working on encountered this problem?
@leewujung Oh huh that is truly strange. I only had 2 files from the 2021 NWFSC Pacific Hake Survey (Shimada) that displayed this problem.
I wonder if this is just a streaming issue so that duplicate datagrams were sequentially saved to the same place instead of two separate places. I also wonder what Echoview makes of this. Perhaps it just ignores the second ping?
Good that that was just 2 files, so not a big hit in terms of data storage.
Yeah would be good to see what Echoview handling looks like. Maybe we can ask folks with a license to help here.
From communication with Kongsberg we know that this is likely EK80 software issue -- regardless of whether or not it was fully fixed after v1.12.2 (since the files you saw having this problem was from v1.23, but there was no problem in v21.15).
@ctuguinay: Let's drop the duplicated pings as you have implemented, and send a warning to user if the data in the dropped ping are not identical to the preceding (non-dropped) ping so that it is clear what has happened. Thanks!
@leewujung This should be ready for review again. I think the pinning worked? Not sure.
@ctuguinay : Could you try unpinning zarr and see if everything passes here? The failed tests in #1429 are likely in other modules and not the ones changed in this PR. (There I used the [ci tests all] PR title to trigger running all tests; here only the ones in the modules that had code changes are run.)