pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

Migration from MS SQL BLOB to PGSQL BYTEA only copy 4 KB

Open fabianfroehlich opened this issue 3 years ago • 4 comments

  • [ ] pgloader --version
pgloader version "3.6.3~devel"
compiled with SBCL 2.1.11.debian
  • [x] did you search for other similar issues?

  • [ ] how can I reproduce the bug?

I'm trying to migrate a MSSQL DB to PG. In the MSSQL DB there are several columns containing BLOB's > 50 MB. The target coulumn has the BYTEA type witch is the default and correct one. After I'm running pgloader with my configuration there are only 4 KB in the target columns. So the files are not migrated compleatly. There are no errors shown in the pgloader output. What am I missing?

load database
     from mssql://XXXX
     into postgresql://YYYY


alter schema 'dbo' rename to 'public';

fabianfroehlich avatar Jul 19 '22 16:07 fabianfroehlich

Also running into the same issue, bumping for visibility

rajdtta avatar Sep 17 '22 21:09 rajdtta

Made issue #1432 detailing my scenario

rajdtta avatar Sep 17 '22 21:09 rajdtta

@rajdtta for a "quick" fix I solved this problem like the following:

  1. create a new schema and boot your application without any tables in it. You might need to enable the auto create of the tables during startup of the application. If there is no auto creation you need to create the tables according to the insallation manual of the application.
  2. stop the application and empty (truncate) all tables
  3. create a database dump of the schema in a file called pre-migration.sql You might need to change some column types for the migration to work like OID to bytea or some fields to TEXT depending on your schema.
  4. edit your migration.cmd like:
load database
     from XXXX
     into postgresql://USER:PWD@HOST:PORT/SCHEMA
with data only,
     create no schemas,
     batch size=100MB

before load execute /Path/to/pre-migration.sql

alter schema 'dbo' rename to 'public';
  1. run the migration with the new config
  2. create a python3 skript like this depending on your needs:
import pyodbc
import psycopg2

# this is the connection to yor old database in this case Microsoft SQL
conn_mssql = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};TrustServerCertificate=yes;SERVER=SERVER;DATABASE=DATABASE;UID=USER;PWD=PASSWORD')
cursor_mssql = conn_mssql.cursor()

# This is the connection to the postgres database
conn_pgsql = psycopg2.connect(
    host="HOST",
    database="DATABASE",
    user="USER",
    password="PASSWORD")

cursor_pgsql = conn_pgsql.cursor()

# Enter all textfild columns like:
textfields = [
    {"table": "TABLE", "column": "COLUMN", "key": "PRIMARY_KEY"},
]

for textfield in textfields:
    q = f"SELECT {textfield['key']}, {textfield['column']} from {textfield['table']};"
    cursor_mssql.execute(q)
    datasets = cursor_mssql.fetchall()
    for dataset in datasets:
        if None is not dataset[1]:
            u = f"UPDATE {textfield['table']} SET {textfield['column']}=%s WHERE {textfield['key']}='{dataset[0]}'"
            cursor_pgsql.execute(u, (dataset[1],))
            conn_pgsql.commit()



# Enter all OIDcolumns like:
oidfields = [
    {"table": "TABLE", "column": "COLUMN", "key": "PRIMARY_KEY"},
]

for oidfield in oidfields:
    q = f"SELECT {oidfield['key']}, {oidfield['column']} from {oidfield['table']};"
    cursor_mssql.execute(q)
    datasets = cursor_mssql.fetchall()
    for dataset in datasets:
        if None is not dataset[1]:
            u = f"UPDATE {oidfield['table']} SET {oidfield['column']}=%s WHERE {oidfield['key']}='{dataset[0]}'"
            cursor_pgsql.execute(u, (dataset[1],))
            conn_pgsql.commit()

    cursor_pgsql.execute(f"""
alter table  {oidfield['table']} add column {oidfield['column']}_tmp oid;
update  {oidfield['table']} set {oidfield['column']}_tmp = lo_from_bytea(0, {oidfield['column']});
alter table  {oidfield['table']} rename column {oidfield['column']} to {oidfield['column']}_bytearea;
alter table  {oidfield['table']} rename column {oidfield['column']}_tmp to {oidfield['column']};
alter table  {oidfield['table']} drop column {oidfield['column']}_bytearea;
""")
    conn_pgsql.commit()
  1. call the script (this can take some time)
  2. Your data should be successfully migrated. If some columns missed add those in the python script, drop all tables in the database and rerun the migration (pgloader and the python script)

I hope it helps for you

fabianfroehlich avatar Sep 19 '22 09:09 fabianfroehlich

@fabianfroehlich thank you so much for the detailed walk-through.

Using pgloader has worked to transfer pretty much all of this old legacy database over (schema + data) except for the image/file uploads. I think what I'll do is use a modified version that only ports over the relevant columns that did not get transferred fully, though I've only been running tests on a small subsample of data and have not tried it with the full DB just yet.

Hopefully some sort of fix is built directly into pgloader, but at least now there's a public solution for anyone else facing a similar issue.

E: Turns out pgloader was generating the correct schema, but not transferring the correct data. I'm just going to make a script similar to the one you've outlined, so thanks once again.

rajdtta avatar Sep 19 '22 17:09 rajdtta

Any ideas when this might be fixed? Same issue here. All images smaller then 4kb are fine but everthing larger ist cut after 4096 bytes.

pgloader version "3.6.7~devel" compiled with SBCL 2.0.1.debian

mredeker avatar Nov 22 '22 09:11 mredeker

The setting (set mssql parameters textsize to '104857600') pointed out here https://github.com/dimitri/pgloader/issues/603#issuecomment-322008112 works for me. Just set it to the desired maximum textsize and you'll find it in the postgres database.

jceb avatar Nov 22 '22 10:11 jceb