dataframe-go
dataframe-go copied to clipboard
- Address issue: https://github.com/rocketlaunchr/dataframe-go/issues…
…/62 (Chinese characters and python BOM prefix)
No problem reading the file with encoding UTF-8-BOM
.
And No errors exporting to parquet, but can NOT read back the parquet file.
func TestUTF8CSV(t *testing.T) {
fr, err := os.Open("export.csv")
if err != nil {
panic(err)
}
df, err := imports.LoadFromCSV(context.Background(), fr)
if err != nil {
panic(err)
}
out, err := os.Create("export.parquet")
if err != nil {
panic(err)
}
err = exports.ExportToParquet(context.Background(), out, df)
if err != nil {
panic(err)
}
out.Close()
fr, err = os.Open("export.parquet")
source, err := local.NewLocalFileReader("export.parquet")
if err != nil {
panic(err)
}
df, err = imports.LoadFromParquet(context.Background(), source)
if err != nil {
panic(err)
}
fmt.Println(df)
}
=== RUN TestUTF8CSV
--- FAIL: TestUTF8CSV (0.02s)
panic: [NextRowGroup] Column not found: Parquet_go_root.P_231188150229143183 [recovered]
panic: [NextRowGroup] Column not found: Parquet_go_root.P_231188150229143183
goroutine 14 [running]:
testing.tRunner.func1.2({0x13b8d60, 0xc0006af1c0})
/usr/local/opt/go/libexec/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
/usr/local/opt/go/libexec/src/testing/testing.go:1392 +0x39f
panic({0x13b8d60, 0xc0006af1c0})
/usr/local/opt/go/libexec/src/runtime/panic.go:838 +0x207
github.com/rocketlaunchr/dataframe-go/aa.TestUTF8CSV(0x0?)
.../dataframe-go/aa/utf8_csv_test.go:43 +0x1d7
testing.tRunner(0xc0005c9d40, 0x1444b28)
/usr/local/opt/go/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
/usr/local/opt/go/libexec/src/testing/testing.go:1486 +0x35f
Can you read it back in python to check if the output file is valid?
Can you read it back if python?
I don't think so, cause idea plugin Big Data Tools
show Nothing to show
and here is my python scripts out:
编号 年龄 性别 地区 身高cm 体重kg ... 吃零食情况 跑步情况 玩电脑游戏情况 逛街情况 散步情况 夜宵情况
0 None None None None None None ... None None None None None None
1 None None None None None None ... None None None None None None
2 None None None None None None ... None None None None None None
3 None None None None None None ... None None None None None None
4 None None None None None None ... None None None None None None
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
446 None None None None None None ... None None None None None None
447 None None None None None None ... None None None None None None
448 None None None None None None ... None None None None None None
449 None None None None None None ... None None None None None None
450 None None None None None None ... None None None None None None
[451 rows x 21 columns]
I wonder when you used the pull-request branch, it is using the latest (incompatible) version of the parquet parsing package?
I wonder when you used the pull-request branch, it is using the latest (incompatible) version of the parquet parsing package?
I am sure I am using github.com/xitongsys/parquet-go v1.5.2
and github.com/xitongsys/parquet-go-source v0.0.0-20200509081216-8db33acb0acf
When you tried s.Rename("X" + strings.Trim(s.Name(), "\xEF\xBB\xBF"))
, could you read the exported parquet file back in python?