spatsoc icon indicating copy to clipboard operation
spatsoc copied to clipboard

improve speed of build_lines

Open robitalec opened this issue 6 years ago • 2 comments

build_lines is slower than I'd like and it affects dependent group_lines.

The split (split.data.table) is slower than working in a data.table by, but we can't return SpatialLines in the j. Any alternatives?

The rest of the time is spent on the SpatialLines function. Is there a performance incentive of switching from sp to sf?

robitalec avatar Nov 29 '18 21:11 robitalec

From their website, the sf package should be faster than the sp, due to SpatialIndexing. About the split.data.table do you mean split.data.frame?

FilipeamTeixeira avatar Dec 05 '18 13:12 FilipeamTeixeira

Interesting about the SpatialIndexing. Could you share a link describing it?

I would want to do a full switch from sp to sf instead of depending on both of them. Have you noticed any challenges installing sf on different OS or other potential downsides of a switch?

Here's all the uses of sp:

./R/build_polys.R:147:    spPts <- sp::SpatialPointsDataFrame(
./R/build_polys.R:149:      proj4string = sp::CRS(projection),
./R/build_lines.R:138:        sp::SpatialLines(list(sp::Lines(sp::Line(
./R/build_lines.R:143:        proj4string = sp::CRS(projection))
./R/build_lines.R:145:      return(do.call(sp::rbind.SpatialLines, l))

I do mean split.data.table since input DT is a data.table, build_lines uses the internal data.table method (data.table:::split.data.table):

./R/build_lines.R:131

lst <- split(DT[dropRows, on = splitBy][!(dropped)][order(get(sortBy))],
             by = splitBy, sorted = TRUE)

This splitting may be avoided with a switch to sf, I'm not sure.

robitalec avatar Dec 07 '18 20:12 robitalec