echopype
echopype copied to clipboard
Strange GPS coordinates appearing with add_location() method
Hello, I am trying to use echopype to match up Sv data with its associated geospatial information and I am getting strange GPS values when using "add_location()". I have noticed this happening with a number of different raw files, below I include one example.
This is using python 3.10.12 and echopype==0.8.3.
This first piece of code is my manual method for pairing up NMEA datagram latitude and longitudes with the nearest ping_time and this works as expected:
!pip install s3fs boto3==1.34.51 numpy==1.24.4 pandas==1.5.3 xarray==2022.12.0 geopandas==0.14.3 folium==0.16.0 typing_extensions==4.8.0 echopype==0.8.3 mapclassify
import s3fs
import folium
import os
import boto3
import numpy as np
import echopype as ep
import geopandas
import folium
import mapclassify as mc
import pandas as pd
import xarray as xr
from botocore import UNSIGNED
from botocore.config import Config
import matplotlib.pyplot as plt
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
s3_file_system = s3fs.S3FileSystem(anon=True)
print(f"echopype version: {ep.__version__}")
bucket_name = 'noaa-wcsd-pds'
ship_name = 'Albatross_Iv'
cruise_name = 'AL0403'
sensor_name = 'EK60'
file_name = "L0010-D20040416-T094042-EK60.raw"
raw_file_s3_path = f"s3://{bucket_name}/data/raw/{ship_name}/{cruise_name}/{sensor_name}/{file_name}"
echodata = ep.open_raw(raw_file_s3_path, sonar_model=sensor_name, use_swap=True, storage_options={'anon': True})
latitude = echodata.platform.latitude.values
latitude_values = latitude.copy()
latitude_values.sort()
print(latitude_values)
# [ 0. 0. 43.68361 ... 43.811595 43.811595 43.811595] <-- Note sorted latitude values
longitude = echodata.platform.longitude.values
nmea_times = np.sort(echodata.platform.time1.values)
time1 = echodata.environment.time1.values
indices = np.searchsorted(a=nmea_times, v=time1, side="right") - 1
lat = latitude[indices]
lat[indices < 0] = np.nan # values recorded before indexing are set to nan
lon = longitude[indices]
lon[indices < 0] = np.nan
my_gps_df = pd.DataFrame({'latitude': lat, 'longitude': lon, 'time': time1}).set_index(['time'])
my_gps_gdf = geopandas.GeoDataFrame(my_gps_df, geometry=geopandas.points_from_xy(my_gps_df['longitude'], my_gps_df['latitude']), crs="epsg:4326")
np.nanmin(my_gps_gdf.latitude) # should be '0.0'
my_gps_gdf[(np.abs(my_gps_gdf.latitude) < 1e-4) & (np.abs(my_gps_gdf.longitude) < 1e-4)] = np.nan # remove null island values
np.nanmin(my_gps_gdf.latitude) # should be '43.68361'
my_gps_gdf.index = my_gps_gdf.index.astype(str)
my_gps_gdf.explore()
And below here is code that uses the "add_location()" method which seems to introduce new values such as "19.572214" and "23.97964":
ds_sv = ep.calibrate.compute_Sv(echodata)
ds_sv_location = ep.consolidate.add_location(ds_sv, echodata)
latitude = ds_sv_location.latitude.values
longitude = ds_sv_location.longitude.values
ping_time = ds_sv_location.ping_time.values
gps_df = pd.DataFrame({'latitude': latitude, 'longitude': longitude, 'time': ping_time}).set_index(['time'])
gps_gdf = geopandas.GeoDataFrame(gps_df, geometry=geopandas.points_from_xy(gps_df['longitude'], gps_df['latitude']), crs="epsg:4326")
gps_gdf[(np.abs(gps_gdf.latitude) < 1e-4) & (np.abs(gps_gdf.longitude) < 1e-4)] = np.nan # remove null island values
latitude_values2 = list(gps_gdf.latitude.copy())
latitude_values2.sort()
print(latitude_values2[:5])
# [nan, 19.57221436379681, 23.979649686736696, 43.68361412270737, 43.68363237740446]
gps_gdf.index = gps_gdf.index.astype(str)
gps_gdf.explore()
The main cruise path is in the upper left, and the new points seem to be extrapolated between the feature and Null Island.
If you have access to Google Colab, I have a notebook that should be shared here to use:
https://colab.research.google.com/drive/1TxzMMJ6hjPCo-ZWZsWm7ltd-dWDAFhMu?usp=sharing
I am wondering if this is a coding error on my part or if there is a problem with the interpolation. Looking here:
https://github.com/OSOceanAcoustics/echopype/blob/7f7e68bff98ced27644db88e1bf94600826c150f/echopype/consolidate/api.py#L186
Does it make sense to define 'method="nearest"' for the interpolation?
I have traditionally opted for numpy's searchsorted aligned from the "right" so that Sv data isn't paired with potentially stale GPS values in some edge cases where the data jumps position mid-file.
It would be nice to hear what others thoughts are on the subject. And thank you for any help!
@oftfrfbf : Thanks for reporting this. @ctuguinay and I looked into this briefly and what we found was that this is not a bug in the code. This is due to the GPS data having values that are literally 0 before actual values show up, so the interpolated values are between 0 and 43.xxx.
We think that this is something that we should not try to "fix" automatically in the Echopype codebase, because there are many ways the GPS data can go wrong with interpolation (like the recent #1294). Instead, we will add a warning about GPS values having 0s, since that is unlikely the case for actual measurements. Users can then decide how they want to clean up the GPS data before calling add_location
.
Let us know if you have additional thoughts on this.
Closed by #1296