tobac icon indicating copy to clipboard operation
tobac copied to clipboard

The future of `analysis.py` functions

Open freemansw1 opened this issue 2 years ago • 2 comments

With the discussion of docstrings for these functions in #138, I think it's time that we have a discussion about the future of the analysis functions in analysis.py more generally. I also want us to discuss plotting.py and centerofgravity.py, but those conversations can be in separate issues. While this issue will be focused on v1.x, I think what we take with us to v2.x and beyond is relevant here.

I would really welcome thoughts and comments from @mheikenfeld and @pjmarinescu on the history of these functions and any thoughts that you have on their longer-term future.

Here is a list of all functions in analysis.py with a short description of what they do, and my general thoughts on them:

  • cell_statistics_all https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L9

    • Runs cell_statistics across all cells passed in.
    • Exists in v2.0-dev
  • cell_statistics https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L40

    • Builds netcdf files of statistics of each cell. Not sure exactly what is output; need to run to double check
    • I think we should immediately depreciate this function.
    • Exists in v2.0-dev
  • cog_cell https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L174

    • Calculates the (x,y,z) center of gravity for each cell
    • I think we should immediately depreciate this function, as it is very specific to clouds and very specific to liquid/frozen precip.
    • This functionality is also replicated in centerofgravity.py.
    • Removed from analysis.py in v2.0-dev.
  • lifetime_histogram https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L227

    • Calculates a histogram of cell lifetime
    • I am indifferent about this function, but it is something that is really easy for users to do. Maybe it is better to keep it around and have it work with xarray and pandas as we transition to being xaray?
    • Exists in v2.0-dev
  • haversine https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L240

    • Calculates the haversine distance between two lat/lon points on Earth.
    • I think this function makes more sense in util, but that is ensnarled in the discussion on #122 .
    • Exists in v2.0-dev
  • calculate_distance https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L263

    • Calculates distance (in meters) between features, either using lat/lon or x/y
    • I think this is generally reasonable to keep around, but it's very specific with the projection coordinates.
    • Exists in v2.0-dev
  • calculate_velocity_individual https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L314

    • Calculates velocity of a cell
    • This is generally reasonable to keep around, especially as we move to xarray for everything.
    • See also #37
    • Exists in v2.0-dev
  • calculate_velocity https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L323

    • Calculates all cell velocities
    • Exists in v2.0-dev
  • velocity_histogram https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L336

    • Calculates a histogram of cell velocities
    • I feel similarly to this as I do to lifetime_histogram
    • Exists in v2.0-dev
  • calculate_nearestneighbordistance https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L356

    • Calculates the distance between each feature and its nearest neighbor.
    • I think this is fine in there, but why does it not also report what the nearest neighbor is?
    • Exists in v2.0-dev
  • nearestneighbordistance_histogram https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L389

    • Same as the other histogram functions, and I feel similarly
    • Exists in v2.0-dev
  • calculate_areas_2Dlatlon https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L412

  • Calculates cell area using lat/lon

  • I think the implementation in v2.0-dev is better and we should probably migrate it to v1.x for 1.6 or earlier

  • calculate_areas https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L476

    • See discussion on calculate_areas_2Dlatlon
    • I am not opposed to keeping this, but it will be challenging to move this to a pure xarray implementation.
    • Exists in v2.0-dev
  • area_histogram https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L535

    • See discussion on earlier histogram functions
    • Exists in v2.0-dev
  • histogram_cellwise https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L563

    • Produces a histogram of whatever cell value (using max/min/mean) over each cell
    • I am happy for this to remain in, but it will again be more challenging to move this to a pure xarray implementation.
    • Exists in v2.0-dev
  • histogram_featurewise https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L581

    • Same as histogram_cellwise, but along features rather than along cells.
    • Same comments as histogram_cellwise
    • Exists in v2.0-dev
  • calculate_overlap https://github.com/tobac-project/tobac/blob/e6e8fc30a25e9b16ebe8917055c81d593c16128c/tobac/analysis.py#L588

  • Calculates whether or not two cells overlap

  • I think this function needs a lot more explanation, but it could potentially be useful.

  • Exists in v2.0-dev

This was longer than I was expecting, so I've put a summary here:

Recommend Depreciation

  • cell_statistics_all
  • cell_statistics
  • cog_cell

Everything else could generally go either way, but we need more documentation on these functions in general. Perhaps a page or two in the documentation specifically on how to use these functions?

freemansw1 avatar Jun 27 '22 21:06 freemansw1

Thanks for the nice overview of your understanding of the functions... I think most of your assessment is pretty similar to how I see it now, too.. I think quite a lot of it stemmed from the early days when we were taking existing code that I had written to wrap it up into the package and thus relatively specific to the way we were using the tracking.

Deprecating the more detailed things in here makes perfect sense. Some of it like the histograms could maybe picked up in some kind of tutorial notebook. Other things might be better placed under utils.

I guess for most of these functions I might be the only one who has really used them a bit more (would be good to hear if anyone else has, but I doubt it due to the lack of documentation...) and for me it would be completely fine to break backwards compatibility here at some point..

mheikenfeld avatar Jun 28 '22 20:06 mheikenfeld

I am sitting down now to add to the missing docstring where it seems to make sense at this point (See discussion in #138), then we can still see what we do with those functions in the future.

mheikenfeld avatar Jun 28 '22 20:06 mheikenfeld

Resolved with #207

freemansw1 avatar Nov 29 '22 17:11 freemansw1