rmapshaper icon indicating copy to clipboard operation
rmapshaper copied to clipboard

Handle special column classes more elegantly

Open mdsumner opened this issue 6 years ago • 5 comments

(just some notes for now, @ateucher please assign to me)

  • can we rejig to do the data frame traverse first, otherwise long-running geom-task fails
  • use as.Date/as.POSIXct with explicit format string because internal logic is quirky

as.Date(c("", "2001-01-01")) ## is an error but as.Date(c("2001-01-01", "")) ## gives an NA in 2nd place

mdsumner avatar Jan 19 '18 04:01 mdsumner

Oh your second point is very interesting!

I'm not quite sure what you mean by doing data frame traversal first... right now it collects the information about the columns in col_classes(), and uses that information to put it back right with restore_classes() after it has gone through the mapshaper routine. It clearly needs to deal with Date/POSIXct columns better. And probably in restore_classes() find a way to catch conversion errors - make some sort of safe_as() function that traps errors and emits a warning when conversion fails rather than erroring out?

ateucher avatar Jan 19 '18 05:01 ateucher

The scenario in mind is when the geometric conversion takes some time, and presumably succeeds - but then we fail on some data column - so could we fail on the data column first, before the geom operation?

I'm also guessing that a loop for the "{ }" check would be better than a "[<-" replacement, but I haven't checked yet.

It's probably not important if we get the restoration right :) but it's caught me out many times.

mdsumner avatar Jan 19 '18 06:01 mdsumner

The scenario in mind is when the geometric conversion takes some time, and presumably succeeds - but then we fail on some data column - so could we fail on the data column first, before the geom operation?

It's very possible I'm missing something, but the data column parsing has to happen after the geometric conversion because the sequence of events is: sf/sp -> geojson -> into v8 context -> mapshaper magic -> out of v8 context -> geojson -> sf/sp. It's this last geojson -> sf/sp where we are restoring column classes and that's where the failure happens.

Unless.... what if we remove all of the attributes except a row id before passing the object into the v8 context, and rejoin the attributes to the processed spatial object when done? Then we could avoid all of that messiness of storing and restoring column classes, and it would have the added benefit of making the geojson object that gets sent into the v8 context smaller (if there were lots of attributes).

ateucher avatar Jan 19 '18 18:01 ateucher

Oh true, I forget it's not a one-to-one output, you're quite right! Your last proposal sounds good, and should work.

mdsumner avatar Jan 19 '18 22:01 mdsumner

I think it's going to require a bit more thought around dealing with objects that end up being aggregated/disaggregated (thinking especially ms_dissolve(), ms_explode(), ms_simplify(explode = TRUE), but there may be more...

ateucher avatar Jan 25 '18 05:01 ateucher