h5cpp icon indicating copy to clipboard operation
h5cpp copied to clipboard

writing an armadillo matrix produces a transposed matrix in h5

Open scontini76 opened this issue 2 years ago • 2 comments

Consider this code:

int main()
{
    arma::mat M = arma::linspace(1, 10, 10);
    M.reshape(2, 5);

    std::cout << "my M shape is:"
              << "\n";

    M.brief_print();
    auto h = h5::create("test.h5", H5F_ACC_TRUNC);
    h5::write(h, "/test", M);

    std::cout << "reading back..."
              << "\n";

    M = h5::read<arma::mat>("test.h5", "/test");
    std::cout << "my M shape is:"
              << "\n";

    M.brief_print();

    return 0;
}

That give this output:

my M shape is:
[matrix size: 2x5]
    1.0000    3.0000    5.0000    7.0000    9.0000
    2.0000    4.0000    6.0000    8.0000   10.0000
reading back...
my M shape is:
[matrix size: 5x2]
    1.0000    6.0000
    2.0000    7.0000
    3.0000    8.0000
    4.0000    9.0000
    5.0000   10.0000

That looks wrong in h5dump:

HDF5 "test.h5" {
GROUP "/" {
   DATASET "test" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5, 2 ) / ( 5, 2 ) }
      DATA {
      (0,0): 1, 2,
      (1,0): 3, 4,
      (2,0): 5, 6,
      (3,0): 7, 8,
      (4,0): 9, 10
      }
   }
}
}

What I expected is:

my M shape is:
[matrix size: 2x5]
    1.0000    3.0000    5.0000    7.0000    9.0000
    2.0000    4.0000    6.0000    8.0000   10.0000
reading back...
my M shape is:
[matrix size: 2x5]
    1.0000    3.0000    5.0000    7.0000    9.0000
    2.0000    4.0000    6.0000    8.0000   10.0000

Which is fine in h5dump too:

HDF5 "test.h5" {
GROUP "/" {
   DATASET "test" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 2, 5 ) / ( 2, 5 ) }
      DATA {
      (0,0): 1, 2, 3, 4, 5,
      (1,0): 6, 7, 8, 9, 10
      }
   }
}
}

To have what I expected I must revert the commit "arma matrix rows/cols swapped to correct C/Fortran order mismatch" (be0a1918156433457ded8e33206ba74438e107e4)

Can you confirm that something is wrong in the library or I'm doing something stupid? :)

scontini76 avatar Jan 10 '22 17:01 scontini76

Great find! Hmmm there is more to this, get to it in a sec... The top one shouldn't be happening: write - read == 0 whereas the h5dump case is not quite straightforward.

The first case is a clear bug. Thanks for letting me know! The second one is complicated: Linear algebra libraries mark the layout with transpose flag but HDF5 doesn't have this feature, leading to:

  1. give up zero copy (performance) and copy each dataset from col major to row major. That is lot's of copying! Considering that the majority of scientific platforms are colmajor. In cases where not enough memory is left then it will fail
  2. as a feature request: convince the HDFGroup to add a flag marking transpose

Will take a closer look when I can, until I am adding @gheber to the conversation. best: steve

steven-varga avatar Jan 11 '22 19:01 steven-varga

Ok, let me know if you need some tests from me. Then I wish HDFGroup consider to add a transpose flag! In the meanwhile I'll transpose the data before writing them, or I'll do the trick of reverting be0a191

scontini76 avatar Jan 12 '22 14:01 scontini76