rmapshaper Better facilitate piping with `sp` inputs

Better facilitate piping with `sp` inputs

Open ateucher opened this issue 8 years ago • 12 comments

For any of the methods for sp objects, the function converts it to json (character) before sending it into the V8 context along with commands for processing, and a json character object is returned from the V8 context. This is then converted back to an sp object before being returned to the user.

If the function is being used in a magrittr pipe chain with other rmapshaper functions, it should not perform the final conversion to sp, rather it should be able to return a json object to be passed to the next rmapshaper function in the chain so that the expensive sp -> json -> sp operation doesn't need to happen at each step. Finally, in the last step in the chain, the resulting object should be converted to sp to be returned to the user.

Current behaviour

SpatialPolygonsDataFrame_object %>%
ms_simplify() %>% # sp obj is converted to json, simplified, then converted to sp
ms_filter_islands() %>%  # sp obj converted to json, islands are filtered, converted to sp
ms_lines() # sp obj is converted to json, converted to lines, converted to sp and returned

Preferred behaviour

SpatialPolygonsDataFrame_object %>%
ms_simplify() %>% # sp obj is converted to json, simplified, returned as json
ms_filter_islands() %>%  # islands are filtered, returned as json
ms_lines() # gets converted to lines, then converted to sp and returned

This could be achieved by:

The function detects that is being piped into another rmapshaper function and so doesn't convert the return value to sp, rather returns a json character string to be piped into the next function. The last function in the chain knows that it is last and thus converts and returns an sp object

Have a return_class argument that has default class(input), but can be set to json or one of the Spatial classes depending where in the chain it is:

SpatialPolygonsDataFrame_object %>%
ms_simplify(return_class = 'json') %>% 
ms_filter_islands() %>% 
ms_lines(return_class = 'SpatialLinesDataFrame')

Just advise users to convert to json before doing a pipe, as I currently do in the vignette and README:

SpatialPolygonsDataFrame_object %>%
geojsonio::geojson_json() %>%
ms_simplify() %>% 
ms_filter_islands() %>% 
ms_lines() %>%
geojsonio::geojson_sp()

Jul 08 '16 18:07 ateucher

@ateucher I guess this is subjective. Personally, I am still not convinced. One issue is debugging. Debugging pipes is not a good experience imo, and then if you write the equivalent pipe-less solution, the function will run different code, so the error might not come up. Just one example of how non-pure functions are dangerous.

Jul 08 '16 19:07 gaborcsardi

@ateucher, could you preserve your v8 context with each call, so that the transformed data is available for the next mapshaper call and could be reused if provided some argument?

Jul 08 '16 19:07 timelyportfolio

Surely, @gaborcsardi has a point, and there's sometimes a tradeoff between a "clean" DSL. That said it is surely possible, as it is almost the same as what jqr is doing. There the json query is built up by each verb and only the last one in the chain will trigger the actual jq interpreter. I guess you need to consider how you feel about the pros and cons and make up your mind.

Jul 08 '16 19:07 smbache

There is some discussion on daff about how to accomplish preserving V8. Not having to initiate v8 context each time also eliminates quite a bit of overhead.

Jul 08 '16 19:07 timelyportfolio

Also, how do you know what the next function in the pipeline expects as input? That function may come from another package or wherever. Do you check if it is coming from your package? I think this'll get really tricky to implement and probably also fragile. But I might miss the easy workaround....

Jul 08 '16 19:07 gaborcsardi

Maybe as default return_class = auto() where auto will determine whether it is within a pipeline and whether it is the last such...

Jul 08 '16 20:07 smbache

Yes, but even it is not last, it might need to return and sp object, if the next function in the chain does not know about this trick. Anyway, I guess you can work around this somehow....

Jul 08 '16 20:07 gaborcsardi

@gaborcsardi another good point about debugging. And yes, it would need to be able to detect whether the next function in the pipe was from rmapshaper. It does seem like a potential house of cards...

@timelyportfolio I think even more efficient than preserving the v8 context would be to pass the original unaltered json all the way through, collecting and assembling mapshaper commands as we go (perhaps in an attribute), then apply the whole thing at once in the last step - similar to what @smbache mentioned wrt jqr.

Thanks all for your thoughts! I will think on it some more and poke around in jqr, but am leaning towards @gaborcsardi's advice and try not to be to clever ;)

Jul 08 '16 20:07 ateucher

Maybe as default return_class = auto() where auto will determine whether it is within a pipeline and whether it is the last such...

That seems like a pretty nice way to allow user override on a bit of a black-box process...

Jul 08 '16 20:07 ateucher

I think a better (but still tricky) way of solving these kind of problems, is to pass an object that can behave both as json and sp.

This might not be possible at this point, but in general, you can have an R6 object that has two active bindings: json and sp, but internally it might only store one of them. Then if a function requests json, and only sp is stored, it calculates json from sp on the fly (and potentially stores it). And vice-versa.

Jul 08 '16 20:07 gaborcsardi

+1 for R6 idea

Jul 09 '16 07:07 smbache

That is intriguing! I'll probably keep it simple until I see how sfr and geojson play out, but thanks for the ideas!

Jul 11 '16 18:07 ateucher

rmapshaper rmapshaper copied to clipboard

Better facilitate piping with `sp` inputs

rmapshaper
rmapshaper copied to clipboard