vector-benchmark icon indicating copy to clipboard operation
vector-benchmark copied to clipboard

[WIP] Add benchmarks for GeometryOps.jl (a Julia package)

Open asinghvi17 opened this issue 1 year ago • 6 comments

https://github.com/asinghvi17/GeometryOps.jl is a Julia package for (mostly) vector geometry operations. It's still pretty early stage, but I realized it could do half the operations in this benchmark, so wanted to get a foundation of code going.

This PR adds Julia capability to the run_benchmarks.sh, and a folder geometryops which contains:

  • Benchmark files for GeometryOps.jl.
  • A Julia Project.toml which defines a list of packages which must be installed, that Julia can be pointed to.
  • (Optional) A Julia plotting file in geometryops/plots.jl.

Here are the comparisons with GeometryOps, download-5 (edited from the original) This PR is still WIP, but is in a runnable state now.

asinghvi17 avatar Apr 06 '24 03:04 asinghvi17

Thanks, great idea! I will be happy to add this as several people have asked to include Julia in benchmarks. I don't know Julia personally, but I think I will test it during the holidays.

@evetion, what do you think?

kadyb avatar Apr 07 '24 09:04 kadyb

This stuff is great, finally having generic native Julia code for things we would otherwise use GDAL/GEOS for (as do most libraries). In that sense, would be good to benchmark that as well (LibGEOS.jl).

I think we discussed benchmarking at least some file loading in Julia last summer, would be good to eventually include that here as well, but that's not the point of GeometryOps.jl.

PS. Where has GEOS gone in the graph?

evetion avatar Apr 07 '24 15:04 evetion

Good point @evetion - I think I hadn't installed the R GEOS package on my machine then, so it didn't run. Posting updated benchmarks here (plus GeometryOps calling out to GEOS's buffer: download-12


download-10

asinghvi17 avatar Apr 07 '24 17:04 asinghvi17

BTW: Have you seen this year's edition of Spatial Data Science across Languages organized by Martin in Prague? Maybe you will be interested as Julia programmers.

kadyb avatar Apr 08 '24 10:04 kadyb

I've shared the invitation with @evetion I believe (and Martijn and Fabian) but happy to extend it! My knowledge of Julia-land is limited, so feel free to throw names at me.

martinfleis avatar Apr 08 '24 11:04 martinfleis

I'll most likely be there, also an author of GeometryOps.jl and in EU

rafaqz avatar Apr 11 '24 09:04 rafaqz

We have just released a new version of GeometryOps with support for buffer - this PR should be ready to merge after that!

asinghvi17 avatar Jun 11 '24 13:06 asinghvi17

I've just run and updated the PR with the latest changes to GeoDataFrames.jl, which uses GDAL's chunked writes to get some more speedup.

@kadyb this should now run with no additional setup, so the PR is good to merge from my end!

comparison

asinghvi17 avatar Jul 11 '24 10:07 asinghvi17

Thank you very much! I haven't had time to sit it down yet, but I will look into it during the holidays. (There is one problem, because I longer haven't access to the machine on which I tested this, but I will ask someone to help). The second issue is that we also need to update geopandas, because it now uses a new, faster engine (pyogiro) to load and save data.

So overall, based on the new results, Julia outperforms the R and Python packages and the GEOS binding. It would be also interesting to see what the performance of binding to georust looks like (rsgeo).

And one more question that I am curious about. Will the geometryops binding from R/Python be the fastest of all the packages tested? If so, maybe Julia will eventually replace Rust and C++ in the future?

kadyb avatar Jul 12 '24 18:07 kadyb

Thank you very much! I haven't had time to sit it down yet, but I will look into it during the holidays. (There is one problem, because I longer haven't access to the machine on which I tested this, but I will ask someone to help). The second issue is that we also need to update geopandas, because it now uses a new, faster engine (pyogiro) to load and save data.

I expect that pyogrio will get the read/write times to at least the same level as Julia. In the end, they all should be pretty similar (and limited by IO).

So overall, based on the new results, Julia outperforms the R and Python packages and the GEOS binding. It would be also interesting to see what the performance of binding to georust looks like (rsgeo).

Yeah, we should test it. Like pyogrio, I expect georust to be on par with Julia.

And one more question that I am curious about. Will the geometryops binding from R/Python be the fastest of all the packages tested? If so, maybe Julia will eventually replace Rust and C++ in the future?

Not sure what you mean with the sentence. Julia is not a generic replacement for Rust and C++ (Rust might be for C++ though), but it certainly is easy to implement new algorithms, probably for a wider audience than if you would do it in Rust or C++ (neither of all linked authors are proficient in those languages).

evetion avatar Jul 12 '24 21:07 evetion

Not sure what you mean with the sentence.

I saw some benchmarks and Julia demonstrated the same speed as low-level languages. If Julia has easier syntax and a lower entry barrier, then I think it could be a very good choice for writing geoprocessing algorithms compared to C++ or Rust. Moreover, we can see that geometryops is faster than R binding to GEOS (probably the same is true for pygeos). Hence, I am also curious what the overhead of calling Julia from R looks like.

Julia is not a generic replacement for Rust and C++

What are the limitations? Or why Rust / C++ would be better?

kadyb avatar Jul 12 '24 23:07 kadyb

I saw some benchmarks and Julia demonstrated the same speed as low-level languages. If Julia has easier syntax and a lower entry barrier, then I think it could be a very good choice for writing geoprocessing algorithms compared to C++ or Rust. Moreover, we can see that geometryops is faster than R binding to GEOS (probably the same is true for pygeos). Hence, I am also curious what the overhead of calling Julia from R looks like.

Agreed! Calling other languages will always cause overhead, and I'm not sure what that will be from R/Python to Julia. Much also has to do with the geometry types used. Seems like a good experiment for SDSL.

What are the limitations? Or why Rust / C++ would be better?

Julia is dynamically typed (like Python/R), whereas Rust/C++ are statically typed. Julia has a garbage collector (like Python/R), whereas the other languages do not. So that makes Julia very easy and similar to Python and R, but we can't (yet) make small executables/libraries, or guarantee that the memory footprint is known beforehand and small enough for embedded systems.

evetion avatar Jul 13 '24 06:07 evetion

Julia should have small compiled binaries soonish (currently they're too big).

We will experiment with calling GeometryOps.jl from R and python. If R/Python packages are wrapping GEOS we may be able to just rewrap the same C objects as Julia LibGEOS.jl objects, as GeometryOps.jl already accepts them without conversion (as a short term experiment with minimal changes).

Mostly GeometryOps.jl isn't actually dynamically typed, but statically known algs (hence this performance). So in theory we will be able to compile good static binaries. But practically not yet.

rafaqz avatar Jul 13 '24 07:07 rafaqz

FWIW: https://github.com/r-spatial/sf/issues/2472 ;)

BTW: Shouldn't GeometryOps.jl be listed on https://juliageo.org/?

kadyb avatar Nov 07 '24 10:11 kadyb