eo-learn icon indicating copy to clipboard operation
eo-learn copied to clipboard

[FEAT] FeatureType.DATAFRAME prettypleasewithcherryontop

Open mlubej opened this issue 3 years ago • 7 comments

What is the problem? Please describe.

I'm running a mowing algorithm which produces rich information for each pixel. This information cannot be described with the current features. You have a different amount of events for each pixel, different kinds of info and events are time intervals, not single dates.

Here's the solution

Would it be possible to add a dataframe feature type, so that I could save this info-rich dataframe to the eopatch? This way I could extract some info from the dataframe into the existing feature types, but still keep the original to do with it whatever I want.

Alternatives

Currently this is possible only by pushing the dataframe to the metadata.

Additional context

We have an algorithm which works on dataframes and produces dataframes. We want to use the eo-learn/eo-grow functionality to run the process over eopatches, but keep the algorithm function as is.

mlubej avatar Apr 07 '22 12:04 mlubej

Some more info:

  • dataframes have a .to_json method, so it's possible to convert the info to JSON serializable content without much effort
  • I tried adding the jsonified dataframe to the eopatch metainfo and it worked, saving it to disk also works
  • the only issue was when I print out the eopatch, it tries to show all the dataframe content, which can be quite long. A dedicated feature could handle this nicely with its own repr function

mlubej avatar Apr 07 '22 12:04 mlubej

@mlubej, have you tried using a vector or vector timeless feature for this? These feature types use geodataframes.

AleksMat avatar Apr 07 '22 12:04 AleksMat

@mlubej, have you tried using a vector or vector timeless feature for this? These feature types use geodataframes.

but wouldnt vector features require geometries? Technically one can supply pixel-geometries :thinking:

zigaLuksic avatar Apr 07 '22 13:04 zigaLuksic

It's a good idea, since it's basically a dataframe, but yes, as @zigaLuksic pointed out, it does require geometries. I would avoid adding pixel geometries, there would be too much ballast.

mlubej avatar Apr 07 '22 13:04 mlubej

I tried setting empty geometries (list of None) to the geodataframe and adding it to the eop as a vector_timeles. The saving part takes a bit longer (~2 min) than jsonification+saving of the df (few seconds). Perhaps something to do with the geospatial aspect of it.

But in the end it worked. It saved the file as a geopackage and I loaded it successfully.

mlubej avatar Apr 07 '22 13:04 mlubej

Hm, without geometries you can then just use non-spatial raster features (temporal/timeless scalar/label). This means that you will probably have to write each column of a dataframe into a different feature. Depending on how many columns you have you might get a lot of features but at least serialization will be efficient.

I suggest you try avoiding writing things into meta_info feature. Because your dataframes seem to be huge it would be too inefficient to serialize them as jsons.

AleksMat avatar Apr 07 '22 13:04 AleksMat

Hm, without geometries you can then just use non-spatial raster features (temporal/timeless scalar/label). This means that you will probably have to write each column of a dataframe into a different feature. Depending on how many columns you have you might get a lot of features but at least serialization will be efficient.

The problem is that for each pixel i have, e.g., mowing event 1, mowing event 2, ... These events have temporal information, but they are neither temporal nor timeless, because they are represented as time intervals. Additionally, each pixel has a different number of such events, so one cannot assume that the arrays would have the same length. So I'm afraid this is not possible.

I suggest you try avoiding writing things into meta_info feature. Because your dataframes seem to be huge it would be too inefficient to serialize them as jsons.

I agree writing it to meta_info is not efficient, but I believe it could still be written as a dataframe feature. Additionally, these are not the time series dataframes, but the dataframes which are the result of an algorithm, such as mowing, meaning that they are much smaller in size, like ~15 MB for the whole EOPatch, while the whole EOPatch is ~6 GB.

mlubej avatar Apr 07 '22 13:04 mlubej

I'd say this is not so relevant anymore. closing.

mlubej avatar May 26 '23 12:05 mlubej