rmapshaper
rmapshaper copied to clipboard
Better facilitate piping with `sp` inputs
For any of the methods for sp
objects, the function converts it to json
(character
) before sending it into the V8
context along with commands for processing, and a json
character object is returned from the V8
context. This is then converted back to an sp
object before being returned to the user.
If the function is being used in a magrittr pipe chain with other rmapshaper
functions, it should not perform the final conversion to sp
, rather it should be able to return a json
object to be passed to the next rmapshaper
function in the chain so that the expensive sp -> json -> sp
operation doesn't need to happen at each step. Finally, in the last step in the chain, the resulting object should be converted to sp
to be returned to the user.
Current behaviour
SpatialPolygonsDataFrame_object %>%
ms_simplify() %>% # sp obj is converted to json, simplified, then converted to sp
ms_filter_islands() %>% # sp obj converted to json, islands are filtered, converted to sp
ms_lines() # sp obj is converted to json, converted to lines, converted to sp and returned
Preferred behaviour
SpatialPolygonsDataFrame_object %>%
ms_simplify() %>% # sp obj is converted to json, simplified, returned as json
ms_filter_islands() %>% # islands are filtered, returned as json
ms_lines() # gets converted to lines, then converted to sp and returned
This could be achieved by:
-
The function detects that is being piped into another
rmapshaper
function and so doesn't convert the return value tosp
, rather returns a json character string to be piped into the next function. The last function in the chain knows that it is last and thus converts and returns ansp
object -
Have a
return_class
argument that has defaultclass(input)
, but can be set tojson
or one of theSpatial
classes depending where in the chain it is:SpatialPolygonsDataFrame_object %>% ms_simplify(return_class = 'json') %>% ms_filter_islands() %>% ms_lines(return_class = 'SpatialLinesDataFrame')
-
Just advise users to convert to
json
before doing a pipe, as I currently do in the vignette and README:SpatialPolygonsDataFrame_object %>% geojsonio::geojson_json() %>% ms_simplify() %>% ms_filter_islands() %>% ms_lines() %>% geojsonio::geojson_sp()
@ateucher I guess this is subjective. Personally, I am still not convinced. One issue is debugging. Debugging pipes is not a good experience imo, and then if you write the equivalent pipe-less solution, the function will run different code, so the error might not come up. Just one example of how non-pure functions are dangerous.
@ateucher, could you preserve your v8
context with each call, so that the transformed data is available for the next mapshaper
call and could be reused if provided some argument?
Surely, @gaborcsardi has a point, and there's sometimes a tradeoff between a "clean" DSL. That said it is surely possible, as it is almost the same as what jqr is doing. There the json query is built up by each verb and only the last one in the chain will trigger the actual jq interpreter. I guess you need to consider how you feel about the pros and cons and make up your mind.
There is some discussion on daff
about how to accomplish preserving V8
. Not having to initiate v8
context each time also eliminates quite a bit of overhead.
Also, how do you know what the next function in the pipeline expects as input? That function may come from another package or wherever. Do you check if it is coming from your package? I think this'll get really tricky to implement and probably also fragile. But I might miss the easy workaround....
Maybe as default return_class = auto()
where auto
will determine whether it is within a pipeline and whether it is the last such...
Yes, but even it is not last, it might need to return and sp
object, if the next function in the chain does not know about this trick. Anyway, I guess you can work around this somehow....
@gaborcsardi another good point about debugging. And yes, it would need to be able to detect whether the next function in the pipe was from rmapshaper. It does seem like a potential house of cards...
@timelyportfolio I think even more efficient than preserving the v8 context would be to pass the original unaltered json all the way through, collecting and assembling mapshaper commands as we go (perhaps in an attribute), then apply the whole thing at once in the last step - similar to what @smbache mentioned wrt jqr.
Thanks all for your thoughts! I will think on it some more and poke around in jqr, but am leaning towards @gaborcsardi's advice and try not to be to clever ;)
Maybe as default
return_class = auto()
whereauto
will determine whether it is within a pipeline and whether it is the last such...
That seems like a pretty nice way to allow user override on a bit of a black-box process...
I think a better (but still tricky) way of solving these kind of problems, is to pass an object that can behave both as json
and sp
.
This might not be possible at this point, but in general, you can have an R6 object that has two active bindings: json
and sp
, but internally it might only store one of them. Then if a function requests json
, and only sp
is stored, it calculates json
from sp
on the fly (and potentially stores it). And vice-versa.
+1 for R6 idea
That is intriguing! I'll probably keep it simple until I see how sfr and geojson play out, but thanks for the ideas!