dataframe-go - Address issue: https://github.com/rocketlaunchr/dataframe-go/issues…

…/62 (Chinese characters and python BOM prefix)

Apr 02 '22 10:04 pjebs

No problem reading the file with encoding UTF-8-BOM. And No errors exporting to parquet, but can NOT read back the parquet file.

func TestUTF8CSV(t *testing.T) {
    fr, err := os.Open("export.csv")
    if err != nil {
        panic(err)
    }

    df, err := imports.LoadFromCSV(context.Background(), fr)
    if err != nil {
        panic(err)
    }

    out, err := os.Create("export.parquet")
    if err != nil {
        panic(err)
    }

    err = exports.ExportToParquet(context.Background(), out, df)
    if err != nil {
        panic(err)
    }

    out.Close()

    fr, err = os.Open("export.parquet")
    source, err := local.NewLocalFileReader("export.parquet")
    if err != nil {
        panic(err)
    }
    df, err = imports.LoadFromParquet(context.Background(), source)
    if err != nil {
        panic(err)
    }
    fmt.Println(df)

}

=== RUN   TestUTF8CSV
--- FAIL: TestUTF8CSV (0.02s)
panic: [NextRowGroup] Column not found: Parquet_go_root.P_231188150229143183 [recovered]
	panic: [NextRowGroup] Column not found: Parquet_go_root.P_231188150229143183

goroutine 14 [running]:
testing.tRunner.func1.2({0x13b8d60, 0xc0006af1c0})
	/usr/local/opt/go/libexec/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
	/usr/local/opt/go/libexec/src/testing/testing.go:1392 +0x39f
panic({0x13b8d60, 0xc0006af1c0})
	/usr/local/opt/go/libexec/src/runtime/panic.go:838 +0x207
github.com/rocketlaunchr/dataframe-go/aa.TestUTF8CSV(0x0?)
	.../dataframe-go/aa/utf8_csv_test.go:43 +0x1d7
testing.tRunner(0xc0005c9d40, 0x1444b28)
	/usr/local/opt/go/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
	/usr/local/opt/go/libexec/src/testing/testing.go:1486 +0x35f

Apr 03 '22 06:04 tanyaofei

export.parquet.zip

Apr 03 '22 06:04 tanyaofei

Can you read it back in python to check if the output file is valid?

Apr 03 '22 06:04 pjebs

Can you read it back if python?

I don't think so, cause idea plugin Big Data Tools show Nothing to show

and here is my python scripts out:

       编号    年龄    性别    地区  身高cm  体重kg  ... 吃零食情况  跑步情况 玩电脑游戏情况  逛街情况  散步情况  夜宵情况
0    None  None  None  None  None  None  ...  None  None    None  None  None  None
1    None  None  None  None  None  None  ...  None  None    None  None  None  None
2    None  None  None  None  None  None  ...  None  None    None  None  None  None
3    None  None  None  None  None  None  ...  None  None    None  None  None  None
4    None  None  None  None  None  None  ...  None  None    None  None  None  None
..    ...   ...   ...   ...   ...   ...  ...   ...   ...     ...   ...   ...   ...
446  None  None  None  None  None  None  ...  None  None    None  None  None  None
447  None  None  None  None  None  None  ...  None  None    None  None  None  None
448  None  None  None  None  None  None  ...  None  None    None  None  None  None
449  None  None  None  None  None  None  ...  None  None    None  None  None  None
450  None  None  None  None  None  None  ...  None  None    None  None  None  None

[451 rows x 21 columns]

Apr 03 '22 06:04 tanyaofei

I wonder when you used the pull-request branch, it is using the latest (incompatible) version of the parquet parsing package?

Apr 03 '22 07:04 pjebs

I wonder when you used the pull-request branch, it is using the latest (incompatible) version of the parquet parsing package?

I am sure I am using github.com/xitongsys/parquet-go v1.5.2 and github.com/xitongsys/parquet-go-source v0.0.0-20200509081216-8db33acb0acf

Apr 03 '22 07:04 tanyaofei

When you tried s.Rename("X" + strings.Trim(s.Name(), "\xEF\xBB\xBF")), could you read the exported parquet file back in python?

Apr 03 '22 23:04 pjebs

dataframe-go dataframe-go copied to clipboard

- Address issue: https://github.com/rocketlaunchr/dataframe-go/issues…

dataframe-go
dataframe-go copied to clipboard