vaex icon indicating copy to clipboard operation
vaex copied to clipboard

hdf5 file not able to read in Vaex, from Azure Blob storage

Open nagarajmmu opened this issue 2 years ago • 1 comments

Hi

I am using HSDS to create hdf5 file in Azure Blob storage, as below.

fHSDS = h5pyd.File(HSDS_PATH + FILE_NAME, "w") dset_hsds = fHSDS.create_dataset(DATASET_NAME, (NUM_ROWS,NUM_COLS), dtype='float64', maxshape=(None,NUM_COLS), chunks=(CHUNK_SIZE[0], CHUNK_SIZE[1])) for iRow in range(0, NUM_ROWS, CHUNK_SIZE[0]): dset_hsds[iRow:iRow+CHUNK_SIZE[0]-1, :] = randomData[iRow:iRow+CHUNK_SIZE[0]-1, :] fHSDS.close()

Using Vaex, whenever I am trying to read same hdf5 file from blob, using below code, I am getting "FileNotFoundError: /blob_name/home/testFile_fromPython.h5"

df = vaex.open("/blob_name/home/testFile_fromPython.h5", fs=fs)

in above code if I try to read parquet/csv, I am able to read a file using Vaex, as a Data frame.

Same scenario with local: When I am creating hdf5 file in local and read same file using Vaex, I am able to read the hdf5 file as a Data frame.

Please help me, to read hdf5 from Azure blob storage.

Thanks in advance.

nagarajmmu avatar Aug 03 '22 23:08 nagarajmmu

Don't know if it should matter, but can you change your extension to .hdf5?

Also, can you please format your code, it is very hard to figure out what is happening right now.

JovanVeljanoski avatar Aug 25 '22 21:08 JovanVeljanoski