dataframe-go
dataframe-go copied to clipboard
Error to read parquet with latest parquet-go
- Create a file with python pandas
dataframe = pandas.DataFrame({
"A": ["a", "b", "c", "d"],
"B": [2, 3, 4, 1],
"C": [10, 20, None, None]
})
dataframe.to_parquet("1.parquet")
This file looks like:

- Read this file
func main() {
ctx := context.Background()
fr, _ := local.NewLocalFileReader("1.parquet")
df, err := imports.LoadFromParquet(ctx, fr)
if err != nil {
panic(err)
}
fmt.Println(df)
}
- Got a unique name error
panic: names of series must be unique:
goroutine 1 [running]:
github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
.../rocketlaunchr/[email protected]/dataframe.go:41 +0x33c
github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
.../go/pkg/mod/github.com/rocketlaunchr/[email protected]/imports/parquet.go:110 +0x8ae
main.main()
.../main.go:13 +0x78
- Following the stack, I found some useful informations
-
All series in method
imports.LoadFromParquetwith empty names
-
goFieldNameToActual each keys in this map with prefix "Scheme", but
goNamedidn't, may be it's the reason why can't not find a name from this map

This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?
Can you send me the file
Can you send me the file 1.parquet.zip
Can you create the DataFrame from this package, export it to paraquet and then try and import it back?
Can you create the DataFrame from this package, export it to paraquet and then try and import it back?
I tried it at the first time, it seems like a error parquet file with content "PAR1"
func main() {
df := dataframe.NewDataFrame(dataframe.NewSeriesString("A", nil, []string{"1", "2", "3"}))
file, _ := os.Create("1.parquet")
_ = exports.ExportToParquet(context.Background(), file, df)
}
A Parquet file is not text based. Can you try importing the file back.
A Parquet file is not text based. Can you try importing the file back.
df := dataframe.NewDataFrame(dataframe.NewSeriesString("A", nil, []string{"1", "2", "3"}))
file, _ := os.Create("1.parquet")
_ = exports.ExportToParquet(context.Background(), file, df)
fr, _ := local.NewLocalFileReader("1.parquet")
df, err := imports.LoadFromParquet(context.Background(), fr)
if err != nil {
panic(err)
}
fmt.Println(df)
panic: seek 1.parquet: invalid argument
goroutine 1 [running]:
main.main()
.../main.go:21 +0x465
Exiting.
Error at imports/parquet.go, line 40: pr, err := reader.NewParquetReader(src, nil, int64(runtime.NumCPU()))
A Parquet file is not text based. Can you try importing the file back.
My parquet-go version is v1.6.2: github.com/xitongsys/parquet-go v1.6.2
I tried opening your file and it worked:
package main
import "github.com/xitongsys/parquet-go-source/local"
import "github.com/rocketlaunchr/dataframe-go/imports"
import "fmt"
import "context"
var ctx = context.Background()
func main() {
fr, _ := local.NewLocalFileReader("1.parquet")
defer fr.Close()
df, err := imports.LoadFromParquet(ctx, fr)
if err != nil {
panic(err)
}
fmt.Println(df)
}
OUTPUT:
+-----+--------+-------+---------+
| | A | B | C |
+-----+--------+-------+---------+
| 0: | a | 2 | 10 |
| 1: | b | 3 | 20 |
| 2: | c | 4 | NaN |
| 3: | d | 1 | NaN |
+-----+--------+-------+---------+
| 4X3 | STRING | INT64 | FLOAT64 |
+-----+--------+-------+---------+
I tried opening your file and it worked:
package main import "github.com/xitongsys/parquet-go-source/local" import "github.com/rocketlaunchr/dataframe-go/imports" import "fmt" import "context" var ctx = context.Background() func main() { fr, _ := local.NewLocalFileReader("1.parquet") defer fr.Close() df, err := imports.LoadFromParquet(ctx, fr) if err != nil { panic(err) } fmt.Println(df) }OUTPUT:
+-----+--------+-------+---------+ | | A | B | C | +-----+--------+-------+---------+ | 0: | a | 2 | 10 | | 1: | b | 3 | 20 | | 2: | c | 4 | NaN | | 3: | d | 1 | NaN | +-----+--------+-------+---------+ | 4X3 | STRING | INT64 | FLOAT64 | +-----+--------+-------+---------+
Can you tell me your parquet-go version ?
module main
go 1.18
require (
github.com/rocketlaunchr/dataframe-go v0.0.0-00010101000000-000000000000
github.com/xitongsys/parquet-go-source v0.0.0-20200509081216-8db33acb0acf
)
require (
github.com/apache/thrift v0.0.0-20181112125854-24918abba929 // indirect
github.com/goccy/go-json v0.7.6 // indirect
github.com/golang/snappy v0.0.0-20180518054509-2e65f85255db // indirect
github.com/google/go-cmp v0.4.0 // indirect
github.com/guptarohit/asciigraph v0.5.1 // indirect
github.com/juju/clock v0.0.0-20190205081909-9c5c9712527c // indirect
github.com/juju/errors v0.0.0-20200330140219-3fe23663418f // indirect
github.com/juju/loggo v0.0.0-20200526014432-9ce3a2e09b5e // indirect
github.com/juju/utils/v2 v2.0.0-20200923005554-4646bfea2ef1 // indirect
github.com/klauspost/compress v1.9.7 // indirect
github.com/mattn/go-runewidth v0.0.7 // indirect
github.com/olekukonko/tablewriter v0.0.4 // indirect
github.com/rocketlaunchr/mysql-go v1.1.3 // indirect
github.com/xitongsys/parquet-go v1.5.2 // indirect
golang.org/x/crypto v0.0.0-20200820211705-5c72a883971a // indirect
golang.org/x/exp v0.0.0-20200331195152-e8c3332aa8e5 // indirect
golang.org/x/net v0.0.0-20200904194848-62affa334b73 // indirect
golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a // indirect
gopkg.in/yaml.v2 v2.3.0 // indirect
)
I use github.com/apache/thrift v0.0.0-20181112125854-24918abba929, github.com/xitongsys/parquet-go v1.5.2 and it works.
In the release notes:
[v1.6.0](https://github.com/xitongsys/parquet-go/releases/tag/v1.6.0)
Big changes in the type. Not compatiable with before.
I may need to update package to use 1.6+ instead of 1.5.
No idea why it is not using v1.5 for you since it's registered in the go.mod file.
In the release notes:
[v1.6.0](https://github.com/xitongsys/parquet-go/releases/tag/v1.6.0) Big changes in the type. Not compatiable with before.I may need to update package to use 1.6+ instead of 1.5.
No idea why it is not using v1.5 for you since it's registered in the
go.modfile.
v1.5 works find, may be i installed parquet-go before installed dataframe-go, not sure about it.
It seems the problem solved, I should close this issue
Maybe you directly imported "github.com/rocketlaunchr/dataframe-go/imports" without importing "github.com/rocketlaunchr/dataframe-go". Since there is no go.mod file inside github.com/rocketlaunchr/dataframe-go/imports directory, it just downloaded and used the latest version of parquet-go
Maybe you directly imported
"github.com/rocketlaunchr/dataframe-go/imports"without importing"github.com/rocketlaunchr/dataframe-go". Since there is nogo.modfile insidegithub.com/rocketlaunchr/dataframe-go/importsdirectory, it just downloaded and used the latest version ofparquet-go
Here is my shell records
➜ go get -u github.com/rocketlaunchr/dataframe-go
go: downloading github.com/rocketlaunchr/dataframe-go v0.0.0-20211025052708-a1030444159b
go: downloading golang.org/x/exp v0.0.0-20200331195152-e8c3332aa8e5
go: downloading github.com/google/go-cmp v0.4.0
go: downloading github.com/guptarohit/asciigraph v0.5.1
go: downloading github.com/olekukonko/tablewriter v0.0.4
go: downloading golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a
go: downloading github.com/olekukonko/tablewriter v0.0.5
go: downloading github.com/google/go-cmp v0.5.7
go: downloading github.com/mattn/go-runewidth v0.0.7
go: downloading github.com/mattn/go-runewidth v0.0.13
go: downloading golang.org/x/exp v0.0.0-20220328175248-053ad81199eb
go: downloading github.com/guptarohit/asciigraph v0.5.3
go: downloading github.com/rivo/uniseg v0.2.0
go: added github.com/google/go-cmp v0.5.7
go: added github.com/guptarohit/asciigraph v0.5.3
go: added github.com/mattn/go-runewidth v0.0.13
go: added github.com/olekukonko/tablewriter v0.0.5
go: added github.com/rivo/uniseg v0.2.0
go: added github.com/rocketlaunchr/dataframe-go v0.0.0-20211025052708-a1030444159b
go: added golang.org/x/exp v0.0.0-20220328175248-053ad81199eb
go: added golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
➜ go get -u github.com/xitongsys/parquet-go/parquet
go: downloading github.com/apache/thrift v0.16.0
go: upgraded github.com/apache/thrift v0.0.0-20181112125854-24918abba929 => v0.16.0
go: upgraded github.com/xitongsys/parquet-go v1.5.2 => v1.6.2
➜ go get -u github.com/xitongsys/parquet-go-source
go: downloading github.com/xitongsys/parquet-go-source v0.0.0-20220315005136-aec0fe3e777c
go: upgraded github.com/xitongsys/parquet-go-source v0.0.0-20200817004010-026bad9b25d0 => v0.0.0-20220315005136-aec0fe3e777c
You shouldn't have done the last 2 go gets since they don't have a go.mod file so it just assumed the latest version hence: go: upgraded github.com/xitongsys/parquet-go v1.5.2 => v1.6.2
From Go's point of view, when you do that, it's an unrelated package.
You shouldn't have done the last 2
go getssince they don't have ago.modfile so it just assumed the latest version hence:go: upgraded github.com/xitongsys/parquet-go v1.5.2 => v1.6.2
Get it, thanks a lot
Hi - when is this lib going to be upgraded to use >= V1.6.2 of parquet-go please? having to fix on v1.5.4 just broke all the tagging I was using which assumed V1.6.2 :-(
There is a backward-incompatible change in v1.6.2. Therefore I will need to explore it more deeply.
This package's go.mod is set to github.com/xitongsys/parquet-go v1.5.2 so it should work for you provided you don't try and indepdently go get the "github.com/rocketlaunchr/dataframe-go/imports" package.
Let the main package dictate the dependencies for the sub-packages.