discogs-xml2db icon indicating copy to clipboard operation
discogs-xml2db copied to clipboard

Put Postgres import and sql commands into a single script

Open ijabz opened this issue 5 years ago • 3 comments

Put the following into a script since you need to do all every time to properly import discogs data into the database. I run importcsv.dir with nohup, it takes some time so when I find it has completed I dont want then have to create the keys/indexes ectera and wait some more time.

Would need to check previous step has worked okay before moving to next step.

python3 postgresql/psql.py < postgresql/sql/CreateTables.sql python3 postgresql/importcsv.py /csvdir/* python3 postgresql/psql.py < postgresql/sql/CreatePrimaryKeys.sql python3 postgresql/psql.py < postgresql/sql/CreateFKConstraints.sql python3 postgresql/psql.py < postgresql/sql/CreateIndexes.sql

ijabz avatar Sep 06 '20 11:09 ijabz

I have some concerns as to how having the indexes and keys present in the database before the import affects the speed of the import.

In most databases it tends to be slower, even in bulk load, because:

  • every row has to be PK-checked, FK-checked, before insert; failure of a FK check would result in the entire import failing;
  • having the FKs in place means that the files need to be loaded in a precise order, with the files providing the FK first, and the files using the FK last;
  • after every row or row-threshold insert the indexes may have to be rebuilt, slowing the whole process down (rebuilding at the end is a single table scan)

By contrast, the current solution leaves a DB full of data at the end of the csv import; optimization (PK, FK, indexes) is then left at the latitude of the users.

All that aside, could you try the following two scenarios and see if the outcome is acceptable?

In a script:

Scenario A

  1. Run CreateTables.sql
  2. Import csv
  3. Run the rest of the sql scripts

Scenario B

  1. Run all sql scripts
  2. Import csv

Run each script of the scripts above with the time command and if the results from scenario B are positive and faster than A, I'd be happy to start looking into it.

philipmat avatar Sep 06 '20 13:09 philipmat

Hi, if you look at my scripts you'll see indexes etc are created AFTER the import. Im just suggesting the user should be able to run one command that creates db tables, import and then creates indexes, keys and constraints, rather than having to run 5 different commands.

ijabz avatar Sep 06 '20 15:09 ijabz

--- is then left at the latitude of the users. There maybe cases where they want to change this, but for most I cannot see why they would not want to run add keys and indexes after the db import, otherwise the db will be almost unusable.

ijabz avatar Sep 06 '20 15:09 ijabz