connector-x icon indicating copy to clipboard operation
connector-x copied to clipboard

MySQL TINYINT gets read as boolean into Arrow Dataframe

Open jshridha opened this issue 1 year ago • 0 comments

What language are you using?

Python

What version are you using?

The bug exists in the 0.3.1 and the current 0.3.2 dev version

What database are you using?

MySQL

What dataframe are you using?

Arrow

Can you describe your bug?

Columns using a TINYINT(4) type are cast as a boolean when read into an Arrow dataframe in version 0.3.1 and onwards. I don't know if this is happening across any other types. I have seen columns with large integers sometimes get cast as floats, though I don't have an example of this available.

CREATE TABLE `cx_test` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `test_column` tinyint(4) DEFAULT NULL,
   PRIMARY KEY (`id`)
 ) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=latin1;

INSERT INTO `cx_test` (test_column)
  VALUES (0);
INSERT INTO `cx_test` (test_column)
  VALUES (1);
INSERT INTO `cx_test` (test_column)
  VALUES (2);
INSERT INTO `cx_test` (test_column)
  VALUES (3);

With Version 0.3.0, everything works correctly:

cx.read_sql(conn_str, "select * from cx_test", return_type="arrow")
pyarrow.Table
id: int64
test_column: int64
----
id: [[1,2,3,4]]
test_column: [[0,1,2,3]]

With Version 0.3.1 and onwards, the data is converted into bools:

cx.read_sql(conn_str, "select * from cx_test", return_type="arrow")
pyarrow.Table
id: int64
test_column: bool
----
id: [[1,2,3,4]]
test_column: [[false,true,true,true]]

jshridha avatar May 04 '23 02:05 jshridha