spatsoc
spatsoc copied to clipboard
improve speed of build_lines
build_lines is slower than I'd like and it affects dependent group_lines.
The split (split.data.table) is slower than working in a data.table by, but we can't return SpatialLines in the j. Any alternatives?
The rest of the time is spent on the SpatialLines function. Is there a performance incentive of switching from sp to sf?
From their website, the sf package should be faster than the sp, due to SpatialIndexing.
About the split.data.table do you mean split.data.frame?
Interesting about the SpatialIndexing. Could you share a link describing it?
I would want to do a full switch from sp to sf instead of depending on both of them. Have you noticed any challenges installing sf on different OS or other potential downsides of a switch?
Here's all the uses of sp:
./R/build_polys.R:147: spPts <- sp::SpatialPointsDataFrame(
./R/build_polys.R:149: proj4string = sp::CRS(projection),
./R/build_lines.R:138: sp::SpatialLines(list(sp::Lines(sp::Line(
./R/build_lines.R:143: proj4string = sp::CRS(projection))
./R/build_lines.R:145: return(do.call(sp::rbind.SpatialLines, l))
I do mean split.data.table since input DT is a data.table, build_lines uses the internal data.table method (data.table:::split.data.table):
./R/build_lines.R:131
lst <- split(DT[dropRows, on = splitBy][!(dropped)][order(get(sortBy))],
by = splitBy, sorted = TRUE)
This splitting may be avoided with a switch to sf, I'm not sure.