rerun icon indicating copy to clipboard operation
rerun copied to clipboard

Improve management of the Arrow dependency in C++ SDK

Open eliemichel opened this issue 11 months ago • 1 comments

Describe the annoyance Because of the dependency to Arrow and the way it is fetched, the first build takes a long time and adding Rerun to a project increases the build directory size by more than 1GB (Windows, MSVC). Furthermore, whenever I start my app in Visual Studio, I see arrow_cpp looking for updates and reinstalling and as you can see below it takes > 13s to start my program even when not changing a single line of code:

Build started at 18:35...
1>------ Build started: Project: ZERO_CHECK, Configuration: Release x64 ------
1>Checking File Globs
2>------ Build started: Project: arrow_cpp (ExternalProjectTargets\arrow_cpp\arrow_cpp), Configuration: Release x64 ------
2>1>Performing update step for 'arrow_cpp'
2>-- Already at requested tag: apache-arrow-18.0.0
2>No patch step for 'arrow_cpp'
2>Performing configure step for 'arrow_cpp'
2>-- arrow_cpp configure command succeeded.  See also C:/Users/emichel/SourceCode/TinyColmap/build/arrow/src/arrow_cpp-stamp/arrow_cpp-configure-*.log
2>Performing build step for 'arrow_cpp'
2>-- arrow_cpp build command succeeded.  See also C:/Users/emichel/SourceCode/TinyColmap/build/arrow/src/arrow_cpp-stamp/arrow_cpp-build-*.log
2>Performing install step for 'arrow_cpp'
2>-- arrow_cpp install command succeeded.  See also C:/Users/emichel/SourceCode/TinyColmap/build/arrow/src/arrow_cpp-stamp/arrow_cpp-install-*.log
2>Completed 'arrow_cpp'
========== Build: 2 succeeded, 0 failed, 16 up-to-date, 0 skipped ==========
========== Build completed at 18:35 and took 13.670 seconds ==========

It is possible to installed Arrow manually instead, but on https://arrow.apache.org/install/ it's really not obvious what to choose to "just install" the C++ SDK (on Windows, for Visual Studio), so I ended up installing from source.

To Reproduce Steps to reproduce the behavior:

  1. Create a new C++/CMake project
  2. Add rerun to the project as suggested on the home page (through fetch content):
include(FetchContent)
FetchContent_Declare(rerun_sdk URL
https://github.com/rerun-io/rerun/releases/latest/download/rerun_cpp_sdk.zip)
FetchContent_MakeAvailable(rerun_sdk)
  1. Build and wait.

Expected behavior

  • Ideally Arrow should be shallow cloned to save up ~100MB (NB: CMake's GIT_SHALLOW option of FetchContent_Declare is notoriously broken, here is how I typically work it around -- another possibility is to fetch a zip instead of a git repo, might be even better in this case)
  • Arrow should be in build/_deps like any other fetched content (rather than directly inside the build dir)
  • Arrow should not be locally installed by the build process, as this causes both an increase in build size (binaries are duplicated) and probably also causes the issue of long startup
  • There could be the possibility to install Arrow directly while installing rerun, so that only 1 global install is needed.

Your goals I was adding Rerun to an existing research prototype

Desktop (please complete the following information):

  • OS: Windows 10

eliemichel avatar Nov 26 '24 10:11 eliemichel