lonboard icon indicating copy to clipboard operation
lonboard copied to clipboard

Implement `TripsLayer` for animating moving objects and connect to MovingPandas

Open kylebarron opened this issue 1 year ago • 6 comments

a minimal example Screen Recording 2023-12-05 at 5 15 35 PM

Change list

  • Add new TripsLayer under the experimental module
  • Add dev dependency on movingpandas
  • Add from_movingpandas class method to construct a TripsLayer from a movingpandas TrajectoryCollection

Todo

  • [x] Implement conversion from movingpandas.TrajectoryCollection to GeoArrow.
  • [x] More input validation in TimestampAccessor. Validate timestamps have the same offsetting as the geometry in the main data frame. (Done in https://github.com/developmentseed/lonboard/pull/292/commits/37a64c63a4c24ba4a194d7ced12da030fc7d726d)
  • [ ] Re-implement this example: https://movingpandas.github.io/movingpandas-website/2-analysis-examples/ship-data.html
  • [x] Store a time_offset integer on the TripsLayer that represents the minimum value of the trip data. Note that you'd need to recompute this when a new get_timestamps is assigned onto the layer. (Done in https://github.com/developmentseed/lonboard/pull/292/commits/6addb2e04a4d1f886f6699f47c2384760f3dffc6)
  • [x] Implement custom serialization for the timestamp accessor. Subtract off the time offset when serializing the data, and cast to float32. (Done in https://github.com/developmentseed/lonboard/pull/292/commits/b3468109af07d28aa67cd79b5490350e0dc45d52)
  • [x] Add timezone parameter? (we infer the timezone from the input data)

Open questions

  • How to handle offsetted timestamps? deck.gl stores timestamps as float32, which means there isn't enough integer precision to store milliseconds or nanoseconds since epoch. (Timestamp precision handling done in https://github.com/developmentseed/lonboard/pull/292/commits/ca9dcd1939ccf0d0cd73efcbd9d2718aa38656f8)
  • Where to handle animation? It looks like syncing animation via an ipywidgets.Play widget (connected via jslink) is probably good enough for now, even if it appears to have a decent amount of overhead. The alternative would be to have a manual animation component on the JS side that maintains its own time state.

Example repro

I got data from Access AIS, with a custom bounding box and time range, though it would probably be straightforward to use other data files as well.

import pyarrow as pa
import pandas as pd
import movingpandas as mpd
from lonboard import Map
from lonboard.experimental import TripsLayer
import ipywidgets

path = '/Users/kyle/Downloads/AIS_170180417406763049_2306-1701804175229.csv'
df = pd.read_csv(path)
traj_collection = mpd.TrajectoryCollection(df, 'MMSI', t='BaseDateTime', x='LON', y='LAT')

layer.width_min_pixels = 5
layer.trail_length = 100000

play = ipywidgets.Play(
    value=0,
    min=0,
    max=86399000,
    step=50_000,
    interval=50,
    repeat=True
)
play
ipywidgets.jsdlink(
    (play, 'value'),
    (layer, 'current_time'),
)

cc @anitagraser, you may be interested in this, and/or have ideas for how to better integrate with movingpandas

kylebarron avatar Dec 05 '23 22:12 kylebarron

Thanks for tagging me. This development looks really exciting. Let me know if you have any movingpandas questions.

anitagraser avatar Mar 25 '24 18:03 anitagraser

Howdy, we talked at SciPy, posting to track this PR’s progress 🙂

kdpenner avatar Jul 12 '24 21:07 kdpenner

I added min and sub operations to arro3 https://github.com/kylebarron/arro3/pull/193 and https://github.com/kylebarron/arro3/pull/194.

So for storing the timestamp offset (minimum timestamp in the data), we'll use the min kernel. And then in the serialization process we'll use sub to subtract that min timestamp off, and lastly cast to float32 before sending to the browser.

By storing the min timestamp as an arrow scalar, we should be able to maintain the original precision and metadata of the input time data. This should make it easier to create a time slider and animation control that use real string-formatted times instead of integers from epoch.

kylebarron avatar Sep 25 '24 22:09 kylebarron

I think this is almost ready to go!

Tasks:

  • [x] Publish new version of arro3 with necessary updates from https://github.com/kylebarron/arro3/pull/199. See https://github.com/kylebarron/arro3/pull/200 bumping to 0.4.0-beta.1
  • [x] Create play_widget() as a method on the TripsLayer class. Potentially the step can even be a scalar representing a duration? And then I map that step to the integer step?
  • [x] Have a helper function to go from the integer shown on the widget back to a timestamp. 9391062 (#292)
  • [ ] Implement example from movingpandas
  • [x] Add docstring to TripsLayer class

kylebarron avatar Sep 27 '24 22:09 kylebarron

I'm considering taking this out of the experimental module because the core behavior is pretty stable. It's just the interop with external libraries (other than movingpandas) that isn't really stable.

kylebarron avatar Oct 01 '24 21:10 kylebarron

I'm not sure on the best API regarding current_time. The last commit created a current_time_as_datetime function to convert from integer back to a datetime object. But really current_time, despite being a public deck.gl API, is an internal construct for the Lonboard TripsLayer. So we should probably make current_time private as _current_time. And since we manage the animation details in animate(), that should be fine. Then should there be a current_time method (getter?) that does the conversion from _current_time as int to a datetime object?

We should also check how datetime works with time zones.

kylebarron avatar Oct 01 '24 21:10 kylebarron

Got an air traffic control example working too:

https://github.com/user-attachments/assets/6ab3706e-525e-4b5a-9904-e126d043cc2c

https://github.com/user-attachments/assets/9d7643a1-63bc-4236-bcd0-8412ff2f25d9

On Monday we can just clean up the examples a bit, add a display for the current time of the animation, and then publish the new version!

kylebarron avatar Oct 04 '24 21:10 kylebarron

I'm able to view ~4 million points on my laptop with reduced performance. 800K look great. if you are looking for more examples, here are open data affiliated with our project: https://osf.io/dg6t3/

the trajectories_* subfolders have (unfortunately only) 1 timestamp every 5 minutes.

for the wish list: will be great when the from_geopandas inherited class method works with native datetime or pandas timestamp objects 🙂

cc @hengoren

kdpenner avatar Oct 05 '24 17:10 kdpenner

@kdpenner if you'd like to create an example notebook in your own repo, we can link to it from our docs

kylebarron avatar Oct 07 '24 14:10 kylebarron

for the wish list: will be great when the from_geopandas inherited class method works with native datetime or pandas timestamp objects 🙂

It is technically possible, though difficult, to use from_geopandas or from_duckdb with the TripsLayer, but you have to pass in the get_timestamps parameter separately, and ensure the list sizes match the LineString geometries.

The only way for it to work with native datetime/pandas objects would be to have a nested list inside a pandas column, and for now that's a task for users to convert it to an arrow array.

kylebarron avatar Oct 07 '24 14:10 kylebarron

yeah. would the geodataframe need to be grouped by a unique identifier? for example, if one gdf has 1000 trajectories in it, such that each timestamp has 1000 duplicates, would I need to iterate through gdf.groupby("agent_id")?

movingpandas groups internally, I think, so that each trajectory is unique to an agent

oh I see you specified LineString, rather than a gdf of Points

kdpenner avatar Oct 07 '24 15:10 kdpenner

FWIW layer.get_timestamps.to_numpy() raises a NotImplementedError:

NotImplementedError: Unsupported type in to_numpy List(Field { name: "", data_type: Timestamp(Second, None), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })

to_pylist() works

kdpenner avatar Oct 07 '24 15:10 kdpenner

FWIW layer.get_timestamps.to_numpy() raises a NotImplementedError:

Yes, because it's a variable-size list and numpy doesn't have variable-size lists. (I suppose we should have a clearer error there)

You can flatten the list and then convert the underlying array to numpy, e.g. with pyarrow.array(layer.get_timestamps).values.to_numpy()

kylebarron avatar Oct 07 '24 15:10 kylebarron