parquet-go
parquet-go copied to clipboard
Data inconsistency when parsing bit string exported from Aurora
Hi, when I tried parsing parquet file exported from Aurora with parquet-go
, I met an unexpected data inconsistency. The columns in my table are binary type, such as BINARY, BLOB, LONGBLOB, etc. The complete table schema is shown as the following:
CREATE TABLE jxtest(
`id` char(36) NOT NULL,
`a` bigint unsigned NOT NULL,
`aa` bigint signed NOT NULL,
`b` int(11) unsigned NOT NULL,
`bb` int(11) signed NOT NULL,
`c` smallint signed NOT NULL,
`cc` smallint unsigned NOT NULL,
`d` tinyint signed NOT NULL,
`dd` tinyint unsigned NOT NULL,
`e` float unsigned NOT NULL,
`ee` float signed NOT NULL,
`f` VARCHAR(30) NOT NULL,
`ff` TEXT NOT NULL,
`h` MEDIUMTEXT NOT NULL,
`hh` LONGTEXT NOT NULL,
`ii` TINYTEXT NOT NULL,
`j` DECIMAL NOT NULL,
`jj` DECIMAL(8,0) NOT NULL,
`k` DECIMAL(8,8) NOT NULL,
`kk` DECIMAL(20,0) NOT NULL,
`l` DECIMAL(20,8) NOT NULL,
`ll` DECIMAL(36,0) NOT NULL,
`m` DECIMAL(36,8) NOT NULL,
`mm` DATE NOT NULL,
`n` TIME NOT NULL,
`nn` YEAR NOT NULL,
`o` DATETIME NOT NULL,
`oo` BINARY NOT NULL,
`p` BLOB NOT NULL,
`pp` LONGBLOB NOT NULL,
`q` MEDIUMBLOB NOT NULL,
`qq` TINYBLOB NOT NULL,
`rr` BIT NOT NULL,
`s` BOOLEAN NOT NULL,
`ss` DOUBLE signed NOT NULL,
`t` DOUBLE unsigned NOT NULL,
PRIMARY KEY ( `id` ),
KEY `index_a` (`a`) );
When I parsed the parquet file using PyArrow, the result is:
Where they are bit string. But when I parsed them with
parquet-go
, the output associated with its schema is presented as the following:
schema element: SchemaElement({Type:BYTE_ARRAY TypeLength:<nil> RepetitionType:OPTIONAL Name:P NumChildren:<nil> ConvertedType:<nil> Scale:<nil> Precision:<nil> FieldID:<nil> LogicalType:<nil>}), string: 111111111
schema element: SchemaElement({Type:BYTE_ARRAY TypeLength:<nil> RepetitionType:OPTIONAL Name:Pp NumChildren:<nil> ConvertedType:<nil> Scale:<nil> Precision:<nil> FieldID:<nil> LogicalType:<nil>}), string: 1111111111
schema element: SchemaElement({Type:BYTE_ARRAY TypeLength:<nil> RepetitionType:OPTIONAL Name:Q NumChildren:<nil> ConvertedType:<nil> Scale:<nil> Precision:<nil> FieldID:<nil> LogicalType:<nil>}), string: 111111111
The result of them should've been 0x7F or something like that, but I got a plain text string 111111111
which is definitely not equal to b'111111111'.
Would you please explain about it? Thank you all in advance.