pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

DEFAULT value clause not being preserved for bit(1) / boolean columns during MySQL to Postgres migration

Open 9numbernine9 opened this issue 3 years ago • 5 comments

  • [X] pgloader --version
pgloader version "3.6.3047c9a"
compiled with SBCL 2.0.1.debian
  • [X] did you test a fresh compile from the source tree?

Yes.

  • [X] did you search for other similar issues?

Yes.

  • [X] how can I reproduce the bug?

PGLoader configuration:

LOAD DATABASE
FROM mysql://mysql:mysql@localhost:3306/testdb
INTO postgresql://postgres:postgres@localhost:5432/testdb

BEFORE LOAD DO
  $$ CREATE SCHEMA IF NOT EXISTS testschema; $$;

mysqldump of testdb from the MySQL instance:

-- MySQL dump 10.13  Distrib 8.0.23, for osx10.16 (x86_64)
--
-- Host: 127.0.0.1    Database: testdb
-- ------------------------------------------------------
-- Server version	5.5.5-10.1.38-MariaDB-1~bionic

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!50503 SET NAMES utf8mb4 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `testtable`
--

DROP TABLE IF EXISTS `testtable`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `testtable` (
  `boolwithoutdefault` bit(1) DEFAULT NULL,
  `boolwithfalsedefault` bit(1) DEFAULT b'0',
  `boolwithtruedefault` bit(1) DEFAULT b'1'
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `testtable`
--

LOCK TABLES `testtable` WRITE;
/*!40000 ALTER TABLE `testtable` DISABLE KEYS */;
/*!40000 ALTER TABLE `testtable` ENABLE KEYS */;
UNLOCK TABLES;
mysqldump: Couldn't execute 'SELECT COLUMN_NAME,                       JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"')                FROM information_schema.COLUMN_STATISTICS                WHERE SCHEMA_NAME = 'testdb' AND TABLE_NAME = 'testtable';': Unknown table 'COLUMN_STATISTICS' in information_schema (1109)
  • [X] pgloader output you obtain
pgloader version 3.6.3047c9a
compiled with SBCL 2.0.1.debian
sb-impl::*default-external-format* :ANSI_X3.4-1968
tmpdir: #P"/tmp/pgloader/"
2021-05-03T13:41:45.008000Z NOTICE Starting pgloader, log system is ready.
2021-05-03T13:41:45.018000Z INFO Starting monitor
2021-05-03T13:41:45.021000Z LOG pgloader version "3.6.3047c9a"
2021-05-03T13:41:45.052000Z INFO Parsed command:

LOAD DATABASE
FROM mysql://root:mysqlroot@vanaheim:3306/testdb
INTO postgresql://postgres:postgrespassword@vanaheim:5432/testdb

BEFORE LOAD DO
  $$ CREATE SCHEMA IF NOT EXISTS testschema; $$;


2021-05-03T13:41:45.107000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {1007B4A273}>
2021-05-03T13:41:45.108000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.111000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.131000Z NOTICE Executing SQL block for before load
2021-05-03T13:41:45.139000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {1007B4A273}>
2021-05-03T13:41:45.140000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.144000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.156000Z DEBUG BEGIN
2021-05-03T13:41:45.159000Z SQL CREATE SCHEMA IF NOT EXISTS testschema;
2021-05-03T13:41:45.175000Z LOG Migrating from #<MYSQL-CONNECTION mysql://root@vanaheim:3306/testdb {1007B48AC3}>
2021-05-03T13:41:45.175000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {1007B4A273}>
2021-05-03T13:41:45.224000Z DEBUG CONNECTED TO #<MYSQL-CONNECTION mysql://root@vanaheim:3306/testdb {1007B48AC3}>
2021-05-03T13:41:45.232000Z SQL MySQL: sending query: -- params: db-name
--         table-type-name
--         only-tables
--         only-tables
--         including
--         filter-list-to-where-clause incuding
--         excluding
--         filter-list-to-where-clause excluding
  select c.table_name, t.table_comment,
         c.column_name, c.column_comment,
         c.data_type, c.column_type, c.column_default,
         c.is_nullable, c.extra
    from information_schema.columns c
         join information_schema.tables t using(table_schema, table_name)
   where c.table_schema = 'testdb' and t.table_type = 'BASE TABLE'


order by table_name, ordinal_position;
2021-05-03T13:41:45.258000Z SQL MySQL: sending query: -- params: db-name
--         including
--         filter-list-to-where-clause incuding
--         excluding
--         filter-list-to-where-clause excluding
    SELECT table_name,
           coalesce(cast(data_length/avg_row_length as unsigned), 0)
      FROM information_schema.tables
    WHERE     table_schema = 'testdb'
          and table_type = 'BASE TABLE'

         ;
2021-05-03T13:41:45.264000Z SQL MySQL: sending query: -- params: db-name
--         table-type-name
--         only-tables
--         only-tables
--         including
--         filter-list-to-where-clause incuding
--         excluding
--         filter-list-to-where-clause excluding
SELECT s.table_name, s.constraint_name, s.ft, s.cols, s.fcols,
       rc.update_rule, rc.delete_rule

FROM
 (
  SELECT tc.table_schema, tc.table_name,
         tc.constraint_name, k.referenced_table_name ft,

             group_concat(         k.column_name
                          order by k.ordinal_position) as cols,

             group_concat(         k.referenced_column_name
                          order by k.position_in_unique_constraint) as fcols

        FROM information_schema.table_constraints tc

        LEFT JOIN information_schema.key_column_usage k
               ON k.table_schema = tc.table_schema
              AND k.table_name = tc.table_name
              AND k.constraint_name = tc.constraint_name

      WHERE     tc.table_schema = 'testdb'
            AND k.referenced_table_schema = 'testdb'
            AND tc.constraint_type = 'FOREIGN KEY'



   GROUP BY tc.table_schema, tc.table_name, tc.constraint_name, ft
 ) s
             JOIN information_schema.referential_constraints rc
               ON rc.constraint_schema = s.table_schema
              AND rc.constraint_name = s.constraint_name
              AND rc.table_name = s.table_name;
2021-05-03T13:41:45.274000Z SQL MySQL: sending query: -- params: db-name
--         table-type-name
--         only-tables
--         only-tables
--         including
--         filter-list-to-where-clause incuding
--         excluding
--         filter-list-to-where-clause excluding
  SELECT table_name, index_name, index_type,
         sum(non_unique),
         cast(GROUP_CONCAT(column_name order by seq_in_index) as char)
    FROM information_schema.statistics
   WHERE table_schema = 'testdb'


GROUP BY table_name, index_name, index_type;
2021-05-03T13:41:45.297000Z INFO Processing source catalogs
2021-05-03T13:41:45.310000Z NOTICE Prepare PostgreSQL database.
2021-05-03T13:41:45.318000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {1007B4A273}>
2021-05-03T13:41:45.318000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.321000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.330000Z DEBUG BEGIN
2021-05-03T13:41:45.379000Z SQL DROP TABLE IF EXISTS testdb.testtable CASCADE;
2021-05-03T13:41:45.391000Z SQL CREATE TABLE testdb.testtable
(
  boolwithoutdefault   boolean,
  boolwithfalsedefault boolean default NULL,
  boolwithtruedefault  boolean default NULL
);
2021-05-03T13:41:45.406000Z SQL -- params: table-names
select n, n::regclass::oid
  from (values ('testdb.testtable')) as t(n);

2021-05-03T13:41:45.435000Z NOTICE COPY testdb.testtable with 0 rows estimated [0/4]
2021-05-03T13:41:45.435000Z DEBUG Reader started for testdb.testtable
2021-05-03T13:41:45.452000Z DEBUG start testdb.testtable   440
2021-05-03T13:41:45.452000Z INFO COPY ON ERROR STOP
2021-05-03T13:41:45.456000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {100B7D9823}>
2021-05-03T13:41:45.456000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.461000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.468000Z DEBUG CONNECTED TO #<MYSQL-CONNECTION mysql://root@vanaheim:3306/testdb {100B4C7CF3}>
2021-05-03T13:41:45.468000Z SQL MySQL: sending query: SELECT `boolwithoutdefault`, `boolwithfalsedefault`, `boolwithtruedefault` FROM `testtable`
2021-05-03T13:41:45.470000Z SQL SET search_path TO testdb;
2021-05-03T13:41:45.471000Z DEBUG Reader for testdb.testtable is done in 0.037000s
2021-05-03T13:41:45.472000Z DEBUG Finished processing READER for "testdb.testtable"  0.037000s
2021-05-03T13:41:45.475000Z INFO pgsql:copy-rows-from-queue[2]: testdb.testtable (boolwithoutdefault
                                                 boolwithfalsedefault
                                                 boolwithtruedefault)
2021-05-03T13:41:45.489000Z DEBUG  stop testdb.testtable   | 440 .. 494 = 0.054d0
2021-05-03T13:41:45.490000Z DEBUG Writer[2] for testdb.testtable is done in 0.000000s
2021-05-03T13:41:45.490000Z DEBUG Finished processing WRITER for "testdb.testtable"  0.000000s
2021-05-03T13:41:45.490000Z DEBUG writers-counts[testdb.testtable] = 0
2021-05-03T13:41:45.493000Z NOTICE DONE copying testdb.testtable in 0.000s
2021-05-03T13:41:45.494000Z INFO Done with COPYing data, waiting for indexes
2021-05-03T13:41:45.494000Z INFO Done waiting for indexes
2021-05-03T13:41:45.508000Z NOTICE Completing PostgreSQL database.
2021-05-03T13:41:45.508000Z NOTICE Reset sequences
2021-05-03T13:41:45.519000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {100C0058C3}>
2021-05-03T13:41:45.519000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.524000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.531000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.534000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.570000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@vanaheim:5432/testdb {100C011E43}>
2021-05-03T13:41:45.570000Z DEBUG SET client_encoding TO 'utf8'
2021-05-03T13:41:45.576000Z DEBUG SET application_name TO 'pgloader'
2021-05-03T13:41:45.595000Z DEBUG BEGIN
2021-05-03T13:41:45.614000Z LOG report summary reset
             table name     errors       read   imported      bytes      total time       read      write
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
            before load          0          1          1                     0.038s
        fetch meta data          0          1          1                     0.106s
         Create Schemas          0          0          0                     0.004s
       Create SQL Types          0          0          0                     0.011s
          Create tables          0          2          2                     0.041s
         Set Table OIDs          0          1          1                     0.013s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
       testdb.testtable          0          0          0                     0.054s     0.037s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
COPY Threads Completion          0          4          4                     0.058s
 Index Build Completion          0          0          0                     0.000s
        Reset Sequences          0          0          0                     0.046s
           Primary Keys          0          0          0                     0.000s
    Create Foreign Keys          0          0          0                     0.000s
        Create Triggers          0          0          0                     0.006s
       Install Comments          0          0          0                     0.000s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
      Total import time          ?          0          0                     0.110s
2021-05-03T13:41:45.649000Z INFO Stopping monitor
  • [X] data that is being loaded, if relevant

pg_dump of testdb database in Postgres created by pgloader:

--
-- PostgreSQL database dump
--

-- Dumped from database version 11.11 (Debian 11.11-1.pgdg90+1)
-- Dumped by pg_dump version 13.2

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

--
-- Name: testdb; Type: SCHEMA; Schema: -; Owner: postgres
--

CREATE SCHEMA testdb;


ALTER SCHEMA testdb OWNER TO postgres;

--
-- Name: testschema; Type: SCHEMA; Schema: -; Owner: postgres
--

CREATE SCHEMA testschema;


ALTER SCHEMA testschema OWNER TO postgres;

SET default_tablespace = '';

--
-- Name: testtable; Type: TABLE; Schema: testdb; Owner: postgres
--

CREATE TABLE testdb.testtable (
    boolwithoutdefault boolean,
    boolwithfalsedefault boolean,
    boolwithtruedefault boolean
);


ALTER TABLE testdb.testtable OWNER TO postgres;

--
-- Data for Name: testtable; Type: TABLE DATA; Schema: testdb; Owner: postgres
--

COPY testdb.testtable (boolwithoutdefault, boolwithfalsedefault, boolwithtruedefault) FROM stdin;
\.


--
-- PostgreSQL database dump complete
--

Hello!

I'm running into an issue when using pgloader to migrate a MariaDB 10.1.38 database to a Postgres 11 database. In the source MySQL database, there are several tables that have columns defined as bit(1) type, so these are becoming boolean columns when migrated to Postgres. This works fine, but if the source columns also have a DEFAULT value clause specified, that information appears to get lost in the target Postgres database.

The sample databases shown above are the simplest examples of this behaviour that I could come up with. The mysqldump output shows that I've created a very silly table:

CREATE TABLE `testtable` (
  `boolwithoutdefault` bit(1) DEFAULT NULL,
  `boolwithfalsedefault` bit(1) DEFAULT b'0',
  `boolwithtruedefault` bit(1) DEFAULT b'1'
)

But the pgloader output seems to show that it knows that there are DEFAULT values specified, but they're not being propagated forward to the Postgres database:

2021-05-03T13:41:45.391000Z SQL CREATE TABLE testdb.testtable
(
  boolwithoutdefault   boolean,
  boolwithfalsedefault boolean default NULL,
  boolwithtruedefault  boolean default NULL
);

I'm not sure if there's an easy workaround for this or not? There might be a way to add CAST rules that workaround this, but I'm not sure. In the actual database that I'm migrating, there are a large number of tables with bit(1) columns, and only some of them have DEFAULT values specified, and some default to true and some default to false - ideally I'd like to convert them all correctly and easily at once. 😄

9numbernine9 avatar May 03 '21 14:05 9numbernine9

I hit the same problem, and prepared a GitHub project to reproduce it, before I found this already existing issue.

Any advice on how to configure pgloader (or a fix to it), so that MySQL bit colums' default values are migrated to Postgres boolean columns would be highly appreciated 🙂

froderystad-digdir avatar Nov 01 '21 09:11 froderystad-digdir

I got the same issue as you, my column in MySQL is in bit(1), and converting to postgres using pgloader creates a column as boolean not null default NULL.

Here is what I just did to put a default value in postgres. In your .load file, add this at the end:

...
AFTER CREATE SCHEMA DO
$$ alter table <schema>.<table_name> alter column <column_name> set default false; $$
...

My column in postgres now has a default of false. Let me know if this works for you as it did for me.

mstephano avatar Nov 20 '22 23:11 mstephano

Here is how to fix it properly when mysqldump has default of b'0' or b'1' for bit(1) columns:

  1. copy the transforms.lisp from the repo to your folder where you have your .load file
  2. add these lines to the transforms.lisp file as follow:
...
(defun bits-to-boolean2 (bit-vector)
  "When using MySQL, strange things will happen, like encoding booleans into
   bit(1). Of course PostgreSQL wants 'f' and 't'."
  (when (and bit-vector (= 4 (length bit-vector)))
    (if (string= "b'0'" bit-vector) "f" "t")))
...
  1. add this line to your .load file:
CAST type bit when (= 1 precision) to boolean drop typemod using bits-to-boolean2
  1. run command pgloader --load-lisp-file transforms.lisp <your_file>.load

Note: sorry I don't have much experience with lisp language. If someone finds a better way to include with bits-to-boolean, please let us know. Then we would not need to add the above line in the .load file.

(cc: @dimitri - Thank you so much for your work on this great tool)

mstephano avatar Nov 22 '22 07:11 mstephano

add to @mstephano 's answer

if you have another bit type error you should add bits-to-boolean rule to bits-to-boolean2 rule

this work fine to me

(defun bits-to-boolean2 (bit-vector)
  "When using MySQL, strange things will happen, like encoding booleans into
   bit(1). Of course PostgreSQL wants 'f' and 't'."
  (cond ((and bit-vector (= 1 (length bit-vector)))
         (let ((bit (aref bit-vector 0)))
           (etypecase bit
             (fixnum    (if (= 0 bit) "f" "t"))
             (character (if (= 0 (char-code bit)) "f" "t")))))
        ((and bit-vector (= 4 (length bit-vector)))
         (if (string= "b'0'" bit-vector) "f" "t"))
        (t nil)))

Tocktock avatar Jan 18 '23 02:01 Tocktock

This is what worked for me by using the --load-lisp-file option:

(in-package :pgloader.transforms)

(defun bits-to-boolean (bit-vector)
  "Converts MySQL bit(1) to PostgreSQL boolean 'f' or 't'."
  (cond
    ;; numeric 0 or 1
    ((and (numberp bit-vector) (member bit-vector '(0 1)))
      (if (zerop bit-vector) "f" "t"))

    ;; single-character '0' or '1'
    ((and (stringp bit-vector) (= (length bit-vector) 1))
      (case (aref bit-vector 0)
        (#\0 "f")
        (#\1 "t")))

    ;; binary string representation "b'0'" or "b'1'"
    ((string= "b'0'" bit-vector) "f")
    ((string= "b'1'" bit-vector) "t")

    ;; no match
    (t nil)))

I do get a warning like WARNING: redefining PGLOADER.TRANSFORMS::BITS-TO-BOOLEAN in DEFUN but otherwise it works well.

dkubb avatar Jan 15 '24 18:01 dkubb