netcdf icon indicating copy to clipboard operation
netcdf copied to clipboard

Chunks iterators

Open magnusuMET opened this issue 5 years ago • 8 comments

When reading values from a variable, it should be possible to get a lazy-loading iterator over chunks.

Implementation details: new method: fn values_chunked(start, chunklength, &mut buffer) -> ChunkIterator { }

struct ChunkIterator { start: () buflen: () buffer: () }

magnusuMET avatar Dec 18 '19 10:12 magnusuMET

I am willing to help with that. How to get started?

krestomantsi avatar Oct 18 '23 20:10 krestomantsi

There hasn't been any requests for this, but we an always make some API for reading a variable along a dimension. I think this needs some ideas for how best to create the iterator, and what would be the most useful for the end user. Maybe you have some insight?

An idea is to allow chunking along a dimension (e.g. time) and let each next read everything in that chunk with that dimension increasing. Although with the Extents type this might not be necessary?

magnusuMET avatar Oct 19 '23 07:10 magnusuMET

Having a lazy loader both in space and time is IMO mandatory. In alot of my own usecases the files are larger than RAM, while i only need very local in the spacetime sense values. Julias NCdatasets already does this (so I know where to "steal" ideas). I want to migrate to rust so I will try to implement a lazy loader for myself anyways.

krestomantsi avatar Oct 19 '23 07:10 krestomantsi

@krestomantsi and I been looking at this issue, how do I build the netcdf package locally? Can you also clarify the type annotation written in this issue?

joshniemela avatar Nov 14 '23 16:11 joshniemela

It should be as simple as cloning this repository and running cargo test. If you specify cargo build --features static you don't need to install netcdf-c. The type annotation can be disregarded as the API is not yet known. There is some room for a creative thinker here!

magnusuMET avatar Nov 15 '23 08:11 magnusuMET

warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
   Compiling hdf5-sys v0.8.1
   Compiling netcdf-src v0.3.2 (/home/josh/netcdf/netcdf-src)
error: failed to run custom build command for `netcdf-src v0.3.2 (/home/josh/netcdf/netcdf-src)`

Caused by:
  process didn't exit successfully: `/home/josh/netcdf/target/debug/build/netcdf-src-cf974f2b78ad12dc/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  CMAKE_TOOLCHAIN_FILE_x86_64-unknown-linux-gnu = None
  CMAKE_TOOLCHAIN_FILE_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_TOOLCHAIN_FILE = None
  CMAKE_TOOLCHAIN_FILE = None
  CMAKE_GENERATOR_x86_64-unknown-linux-gnu = None
  CMAKE_GENERATOR_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_GENERATOR = None
  CMAKE_GENERATOR = None
  running: "cc" "--version"
  exit status: 0
  running: "c++" "--version"
  exit status: 0
  running: "cc" "--version"
  exit status: 0
  CMAKE_PREFIX_PATH_x86_64-unknown-linux-gnu = None
  CMAKE_PREFIX_PATH_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_PREFIX_PATH = None
  CMAKE_PREFIX_PATH = None
  CMAKE_x86_64-unknown-linux-gnu = None
  CMAKE_x86_64_unknown_linux_gnu = None
  HOST_CMAKE = None
  CMAKE = None
  running: cd "/home/josh/netcdf/target/debug/build/netcdf-src-d6c4cf1c0f0b9757/out/build" && CMAKE_PREFIX_PATH="" "cmake" "/home/josh/netcdf/netcdf-src/source" "-DBUILD_SHARED_LIBS=OFF" "-DNC_FIND_SHARED_LIBS=OFF" "-DBUILD_UTILITIES=OFF" "-DENABLE_EXAMPLES=OFF" "-DENABLE_DAP_REMOTE_TESTS=OFF" "-DENABLE_TESTS=OFF" "-DENABLE_EXTREME_NUMBERS=OFF" "-DENABLE_PARALLEL_TESTS=OFF" "-DENABLE_FILTER_TESTING=OFF" "-DENABLE_BASH_SCRIPT_TESTING=OFF" "-DENABLE_PLUGINS=OFF" "-DPLUGIN_INSTALL_DIR=OFF" "-DHDF5_VERSION=1.10.7" "-DHDF5_C_LIBRARY=hdf5_debug" "-DHDF5_HL_LIBRARY=hdf5_hl_debug" "-DHDF5_INCLUDE_DIR=/home/josh/netcdf/target/debug/build/hdf5-src-38fdb0c7c7ff9024/out/include" "-DENABLE_NCZARR=OFF" "-DENABLE_DAP=OFF" "-DENABLE_BYTERANGE=OFF" "-DENABLE_DAP_REMOTE_TESTS=OFF" "-DZLIB_ROOT=/home/josh/netcdf/target/debug/build/libz-sys-2ca0cd9f87eb13cc/out/include/.." "-DCMAKE_INSTALL_PREFIX=/home/josh/netcdf/target/debug/build/netcdf-src-d6c4cf1c0f0b9757/out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_C_COMPILER=/etc/profiles/per-user/josh/bin/cc" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_COMPILER=/etc/profiles/per-user/josh/bin/c++" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_COMPILER=/etc/profiles/per-user/josh/bin/cc" "-DCMAKE_BUILD_TYPE=RelWithDebInfo"

  --- stderr
  CMake Warning:
    Ignoring extra path from command line:

     "/home/josh/netcdf/netcdf-src/source"


  CMake Error: The source directory "/home/josh/netcdf/netcdf-src/source" does not appear to contain CMakeLists.txt.
  Specify --help for usage, or press the help button on the CMake GUI.
  thread 'main' panicked at '
  command did not execute successfully, got: exit status: 1

  build script failed, must exit now', /home/josh/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cmake-0.1.50/src/lib.rs:1098:5
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...```  
I get this error

joshniemela avatar Nov 15 '23 16:11 joshniemela

Regarding the API, would you basically want it to be like a slicing function so that you say from lat X => Y and long X => Y and return the matrix that contains these values as a view? Or is it supposed to throw all the values into a matrix of the correct shape

joshniemela avatar Nov 15 '23 16:11 joshniemela

You will also need to pull the submodule by git submodule update --init --recursive

magnusuMET avatar Nov 16 '23 08:11 magnusuMET