Stone-Soup icon indicating copy to clipboard operation
Stone-Soup copied to clipboard

Added an example within the documentation for custom readers supporting pandas DataFrames.

Open BenjaminFraser opened this issue 3 years ago • 5 comments

Added a new example (Custom_Pandas_Dataloader.py) within the documentation in docs/examples for the definition of custom Readers that support pandas DataFrames.

This allows a wide range of data formats supported by pandas to be taken advantage of for Ground Truth Readers and Detection Readers, without the need manually define custom data ingestion processes for each type, e.g. JSON, XML, Parquet, HDF5, .txt, .zip.

Given its similarity to the requirements of the custom reader documentation example (#354), I've linked this pull request to that, which hopefully is not a problem.

These classes do have the disadvantage of requiring the entire dataset in memory. However, it seems that the ability to directly use pandas DataFrames is a feature several users of Stonesoup have shown interest in, which is understandable given the flexibility and processing functionalities this can provide.

The example in Custom_Pandas_Dataloader.py includes the definitions of DataFrameGroundTruthReader and DataFrameDetectionReader classes. Each of these inherit from the existing GroundTruthReader class, along with a custom defined _DataFrameReader class.

These classes operate similarly to the existing CSVGroundTruthReader and CSVDetectionReader classes, except they take as input a pandas DataFrame already read into memory, rather than a path to .csv file. They also have modified generator functions for producing the time and paths / detections.

These have been useful for some work I've done using Stonesoup for some UAV-based non-cooperative radar research, and so hopefully they are also of value to other members of the community!

BenjaminFraser avatar Aug 31 '22 11:08 BenjaminFraser

Thanks for the contribution @BenjaminFraser.

I see docs are failing to build due to pandas being missing dependency. If you could add pandas the dev dependencies in setup.py that should resolve it: https://github.com/dstl/Stone-Soup/blob/435883a67045f72161355e9a5cbb44bcacfa67b1/setup.py#L31-L35

It'd be good to have the readers in the main code base (probably with an optional dependency on pandas) so users can easily access them. And also good to keep the example you've created as both a how to use them, but also, in reference to #354, to show how to create custom readers. (Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with inspect.getsource)

sdhiscocks avatar Aug 31 '22 14:08 sdhiscocks

(Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with inspect.getsource)

Or use of Sphinx literalinclude directive, which can add some syntax highlighting.

sdhiscocks avatar Aug 31 '22 14:08 sdhiscocks

That's no problem at all, and including the Readers within the main code base sounds like a good idea! The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough.

I'll take a look later when I have the chance and put together another PR for those points!

BenjaminFraser avatar Aug 31 '22 15:08 BenjaminFraser

The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough.

We've done this before by simply raising an error on importing of dependencies. https://github.com/dstl/Stone-Soup/blob/5276c1b6b541487203806c6fbc8a0547a9ece762/stonesoup/reader/opensky.py#L5-L10

sdhiscocks avatar Aug 31 '22 15:08 sdhiscocks

Codecov Report

Base: 94.81% // Head: 94.84% // Increases project coverage by +0.02% :tada:

Coverage data is based on head (af4c77b) compared to base (f27eaeb). Patch coverage: 97.33% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
+ Coverage   94.81%   94.84%   +0.02%     
==========================================
  Files         169      170       +1     
  Lines        8221     8296      +75     
  Branches     1216     1230      +14     
==========================================
+ Hits         7795     7868      +73     
- Misses        316      318       +2     
  Partials      110      110              
Flag Coverage Δ
integration 68.50% <0.00%> (-0.63%) :arrow_down:
unittests 92.69% <97.33%> (+0.04%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
stonesoup/reader/pandas_reader.py 97.33% <97.33%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Aug 31 '22 15:08 codecov[bot]