hdmf icon indicating copy to clipboard operation
hdmf copied to clipboard

Fix conversion of Zarr string dataset to HDF5

Open oruebel opened this issue 1 year ago • 2 comments

Motivation

Fix https://github.com/hdmf-dev/hdmf-zarr/issues/211

Zarr does not natively support variable length strings, instead the data is stored as objects with an encoder. When converting from Zarr to HDF5 the ObjectMapper tries to change the dtype in some cases for text datasets to unicode, which in turn can cause reading the data from Zarr to fail. This PR changes this behavior to: 1) not change the dtype for Zarr string datasets, and 2) detect the correct dtype to use in HDF5 in the backend.

Checklist

  • [x] Did you update CHANGELOG.md with your changes?
  • [X] Does the PR clearly describe the problem and the solution?
  • [X] Have you reviewed our Contributing Guide?
  • [X] Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

oruebel avatar Aug 15 '24 08:08 oruebel

Codecov Report

:x: Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 92.29%. Comparing base (704e99d) to head (a8b807c). :warning: Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
src/hdmf/backends/hdf5/h5tools.py 33.33% 1 Missing and 1 partial :warning:
src/hdmf/build/objectmapper.py 85.71% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1171      +/-   ##
==========================================
- Coverage   92.29%   92.29%   -0.01%     
==========================================
  Files          42       42              
  Lines        9685     9703      +18     
  Branches     1962     1968       +6     
==========================================
+ Hits         8939     8955      +16     
- Misses        470      471       +1     
- Partials      276      277       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Aug 15 '24 08:08 codecov[bot]

@rly could you please take a look to see if this is right way to address this issue

oruebel avatar Aug 15 '24 08:08 oruebel

This PR also resolves this issue: https://github.com/hdmf-dev/hdmf-zarr/issues/301

@oruebel could you please review?

rly avatar Nov 01 '25 00:11 rly

@rly thanks for updating the PR. This looks good to me.

oruebel avatar Nov 05 '25 19:11 oruebel