hdf5
hdf5 copied to clipboard
Use of HDF5 1.12
What are you trying to do?
I am trying to read HDF5 database (version 1.12.1).
The datatabase was populated using Python's h5py library. The data is pandas dataframe but guess that should be no problem as h5ls and HDFView app read the data without any issues.
What did you do?
I used the example from this repo for reading table. Also tried to use DataSet instead. This is the code excerpt:
package main
import (
"fmt"
"gonum.org/v1/hdf5"
)
type ohlcv struct {
Index int64 `hdf5:"index"`
Exchange string `hdf5:"exchange"`
Pair string `hdf5:"pair"`
Timestamp int64 `hdf5:"timestamp"`
PriceOpen float64 `hdf5:"price_open"`
PriceHigh float64 `hdf5:"price_high"`
PriceLow float64 `hdf5:"price_low"`
PriceClose float64 `hdf5:"price_close"`
Volume float64 `hdf5:"volume"`
}
func main() {
version, _ := hdf5.LibVersion()
fmt.Printf("HDF5 version: %s\n", version)
file, _ := hdf5.OpenFile("tickers.h5", hdf5.F_ACC_RDONLY)
month, _ := file.OpenGroup("M11")
day, _ := month.OpenGroup("D07")
table, _ := day.OpenTable("table")
recs, _ := table.NumPackets()
for i := 0; i != recs; i++ {
p := make([]ohlcv, 1)
if err := table.Next(&p); err != nil {
panic(fmt.Errorf("next failed: %s", err))
}
fmt.Printf("data[%d]: O:%.2f H:%.2f L:%.2f C:%.2f V:%.2f \n", i, p[0].PriceOpen, p[0].PriceHigh, p[0].PriceLow, p[0].PriceClose, p[0].Volume)
file.Close()
}
What did you expect to happen?
I expected something like this:
HDF5 version: 1.12.1
data[0]: O:62829.33 H:62858.35 L:62829.32 C:62853.66 V:10.72221
data[1]: O:62853.66 H:62920.04 L:62851.32 C:62896.75 V:10.19546
...
data[1439]: O:63276.08 H:63286.35 L:63250.01 C:63273.59 V:43.11052
What actually happened?
What I get is:
HDF5 version: 1.12.1
data[0]: O:24533265083020748587221761909950877822199906846513430683666835688641707196344354649178734577047675756970784403964996179506865859538714624.00 H:153999479823021862704498665709509248968354775291789269717488570675195022731875416084608859555430794393831940365058635304153349319996889497485119259215244127082639950809210292371944342687481593856.00 L:0.00 C:0.00 V:0.00
data[1]: O:-0.00 H:11485591669347015527702671166617436553216.00 L:0.00 C:0.00 V:0.00
data[2]: O:16786184717166469080015018654342952761822206471285346540228583460481189475663073161248768.00 H:116860917747596761471525066204868691258239771993742452785440191
...
data[1433]: O:-9500707167603260.00 H:59636916704940832875429063464307788500085761805873313238334329889516158976.00 L:0.00 C:0.00 V:0.00
data[1434]: O:5019141222517546172965875509332335194542251462150973230580526953620246797924906675853593091481301449281850864438229980645461301058991845935258992265931878573396108562393519846894098059723698237228341624032758439241216420110000323300523402260850195612038020143058501041685903495909081088.00 H:-14749955137625195020933306096366472509413755436838140703864665704181336175168284990050108620591141868165148740370361311351055003224895967047494566278255701287614003245498688726479296978595350966497450734598780051966490510138076888153184354240036864.00 L:0.00 C:0.00 V:0.00
...
data[1438]: O:-0.00 H:40804893379413961024208896.00 L:0.00 C:0.00 V:0.00
data[1439]: O:11485478191699172345758915201790495424512.00 H:-0.00 L:0.00 C:0.00 V:0.00
Also when trying to access Exchange or Pair attributes, I get the following error:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e pc=0x405fbc9]
What version of Go, Gonum, Gonum/netlib and libhdf5 are you using?
go version go1.17 darwin/amd64
gonum.org/v1/hdf5
h5cc -showconfig (excerpt)
General Information:
-------------------
HDF5 Version: 1.12.1
Configured on: Mon Jul 12 08:05:03 BST 2021
Configured by: brew@BigSur
Host system: x86_64-apple-darwin20.4.0
Uname information: Darwin BigSur 20.4.0 Darwin Kernel Version 20.4.0: Thu Apr 22 21:46:47 PDT 2021; root:xnu-7195.101.2~1/RELEASE_X86_64 x86_64
Byte sex: little-endian
Installation point: /usr/local/Cellar/hdf5/1.12.1
Does this issue reproduce with the current master?
Yes, it does!
I surmise this:
p := make([]ohlcv, 1)
needs to read instead:
var p ohlcv
and thus:
fmt.Printf("data[%d]: O:%.2f H:%.2f L:%.2f C:%.2f V:%.2f \n", i, p.PriceOpen, p.PriceHigh, p.PriceLow, p.PriceClose, p.Volume)
Thanks for a quick response.
Took that code from the example but also tried what you are suggesting. Unfortunately that leads to this error:
panic: unsupported kind (struct), need slice or array
which makes sense, as the function definition is:
func (*hdf5.Table).Next(data interface{}) error
(hdf5.Table).Next on pkg.go.dev
Next reads packets from a packet table starting at the current index into the value pointed at by data. i.e. data is a pointer to an array or a slice.
Can it be that you are using some newer version?
EDIT: Just checked the implementations of the Next function in h5pt_table.go and they are identical. EDIT 2: BTW, I also tried ReadPackets instead of Next with the same results.
Did some more research!
If I replace the printing line with
fmt.Printf("data[%d]: %v\n", i, p)
Then there are two possible results dependant on the defintion of the struct:
-
full definition that includes the strings fails with this error comming from the fmt.Printf():
panic: runtime error: growslice: cap out of range
-
if the string are commented out, then the result is this:
data[1437]: [{1625183880000000000 7090182514096892258 1.814982667395619e-306 -3.79181233146521e-284 -4.643804396672689e-134 -9.500616071346912e+15 3.5854690526542615e+184}]
data[1438]: [{1625183940000000000 7090182514096892258 1.814982667395619e-306 5.9896317349078915e+183 3.434212986107372e+237 3.58550892285317e+184 -9.919075148868785e-38}]
data[1439]: [{1625184000000000000 7090182514096892258 1.814982667395619e-306 -3.3900115496356115e+111 -2.0293221659741413e+112 -3.177424435398634e-182 1.1485478191699172e+40}]
where the first column (Index) is perfectly correct but the rest is just messed up.
This leads me to think that the reading ignores the `hdf5:"column_name"`
and reads the values in sequence and thus causing to mess up the data completely.
This hypothesis is somewhat being broken by the fact that even if I leave the struct defintion full (including the strings) then the Next passes and if I do not attempt to print the string values (Exchange / Pair), then the values are displayed but wrong. Which is the original output.
I am being totally lost.
But have a simple question: How does handling string in structs for reading from HDF5 work?
I have noticed that in the master/cmd/test-go-table-01-readback/main.go file there is definition of struct:
type particle struct {
// name string `hdf5:"Name"` // FIXME(sbinet)
Lati int32 `hdf5:"Latitude"`
Longi int64 `hdf5:"Longitude"`
Pressure float32 `hdf5:"Pressure"`
Temperature float64 `hdf5:"Temperature"`
// isthep []int // FIXME(sbinet)
// jmohep [2][2]int64 // FIXME(sbinet)
}
That somehow indicates that strings can be an issue.