CommonDataModel icon indicating copy to clipboard operation
CommonDataModel copied to clipboard

Instructions for using sqlite DDLs for non-R users

Open barabo opened this issue 1 year ago โ€ข 4 comments

Hi there!

I've been playing around trying to get the DDLs loaded for sqlite3, version 5.4 using the sqlite3 command line and I've been having a pretty rough go of it using the provided definitions.

I didn't find any documentation here or elsewhere about the @cdmDatabaseSchema notation in the ddl files, so I went ahead and removed them. I assume they are something used by the sqlrender utility - but I'm not an R user - I'm just trying to create a working OMOPCDM database in sqlite.

I also discovered that sqlite does not allow you to apply primary or foreign keys to tables using the ALTER TABLE commands provided. So, I wrote a brittle little script to apply the PK and FK definitions to the main _ddl.sql file.

This allows me to use the patched _ddl.sql file to define tables with PKs and FKs. The _indices.sql file seems fine as-is.

I did notice that the COHORT table is missing a PRIMARY KEY, so the foreign key check fails in a clean install for just that table. It almost seems like COHORT should have cohort_id integer PRIMARY KEY and COHORT_DEFINITION should have cohort_definition_id integer PRIMARY KEY - and then the COHORT table should have a FK on COHORT_DEFINITION.cohort_definition_id (instead, it seems backwards). But as it is now, if I make COHORT.cohort_definition_id the PK of the COHORT table - it gets me past the failing foreign key checks. Anyway, I expect some of that will change in future versions - my script includes a hack for this.

For posterity, I'm providing my script here. However, are there instructions for people who don't want to use R but just want to create a working DB out there somewhere?

Click to see my script
#!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script.  It will modify the sql files and create an empty
# database called cdm.db
#

set -e
set -o pipefail
set -u


# There should be only one DDL file in the local directory.
DDL=$( ls OMOPCDM*_ddl.sql )
[ ! -e ${DDL} ] && echo "Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1


# Awk scriptlet to add a PRIMARY KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
#   'table' is padded with spaces
#   'column' begins with a tab and ends with a space
#
AWK_PK='
/^CREATE TABLE / {
  if ($0 ~ table) { target=1 } else { target=0 }
}

{
  if (target && $0 ~ column && (!($0 ~ / PRIMARY KEY,/))) {
    sub(/,/, " PRIMARY KEY,");
  }
  print $0
}
'

# Awk scriptlet to add a FOREIGN KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
#   'table' is padded with spaces
#   'column' begins with a tab and ends with a space
#   'ref' is multiple words long
#
AWK_FK='
/^CREATE TABLE / {
  if ($0 ~ table) { target=1 } else { target=0 }
}

{
  if (target && $0 ~ column && (!($0 ~ / REFERENCES /))) {
    if ($0 ~ /,$/) {
      sub(/,$/, " REFERENCES " ref ",");
    } else {
      sub(/ );$/, " REFERENCES " ref ");");
    }
  }
  print $0
}
'

##
# Gets the table names and PK columns from constraints.
#
function get_pks() {
  grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
    | sed -e 's:.* \(.*\) ADD CONSTRAINT .* PRIMARY KEY .\(.*\).;:\1 \2:'
}


##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_pk() {

  # Apply the change to the ddl file content, saving into a temp file.
  cat ${DDL} | awk -v table=" ${1} " -v column="        ${2} " "${AWK_PK}" \
    > ${DDL}.tmp

  # diff the files to assert diff count is 1 changed line only.
  (( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1

  # Apply the temp file to the DDL file.
  mv ${DDL}.tmp ${DDL}
}


function get_fks() {
  grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
    | sed -e 's:ALTER TABLE \(.*\) ADD CONSTRAINT .* FOREIGN KEY .\([^ ]*\). REFERENCES:\1 \2:' \
    | sed -e 's:;::'
}


##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_fk() {

  # Apply the change to the ddl file content, saving into a temp file.
  cat ${DDL} \
    | awk -v table=" ${1} " -v column=" ${2} " -v ref="${3}" "${AWK_FK}" \
    > ${DDL}.tmp

  # diff the files to assert diff count is 1 changed line only.
  (( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1

  # Apply the temp file to the DDL file.
  mv ${DDL}.tmp ${DDL}
}


# Remove the template schema reference, which isn't supported in sqlite.
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e

# Update the table definitions to insert the PRIMARY KEY declarations.
echo "Adding PRIMARY KEYS to DDL"
get_pks | while read table column; do
  mark_pk ${table} ${column}
done

# Insert the FK constraints into the table.
echo "Adding FOREIGN KEYS to DDL"
get_fks | while read table column reference; do
  mark_fk ${table} ${column} "${reference}"
done

# HACK: no PK in the cohort table, so this is needed to satisfy FK checks.
mark_pk cohort cohort_definition_id

# Populate the DB tables and indices.
echo "Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < ${DDL}
sqlite3 cdm.db < OMOPCDM*_indices.sql
sqlite3 cdm.db "pragma foreign_key_check"

# Success!
echo "DONE!"

This updates the _ddl.sql file to look like this:

CREATE TABLE person (
                        person_id integer NOT NULL PRIMARY KEY,
                        gender_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID),
                        year_of_birth integer NOT NULL,
                        month_of_birth integer NULL,
                        day_of_birth integer NULL,
                        birth_datetime REAL NULL,
                        race_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID),
                        ethnicity_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID),
                        location_id integer NULL REFERENCES LOCATION (LOCATION_ID),
                        provider_id integer NULL REFERENCES PROVIDER (PROVIDER_ID),
                        care_site_id integer NULL REFERENCES CARE_SITE (CARE_SITE_ID),
                        person_source_value TEXT NULL,
                        gender_source_value TEXT NULL,
                        gender_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID),
                        race_source_value TEXT NULL,
                        race_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID),
                        ethnicity_source_value TEXT NULL,
                        ethnicity_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID));

When the script runs correctly, you see output like this:

(base) anderson.carl3@home test % ./update-ddl.sh
Adding PRIMARY KEYS to DDL
Adding FOREIGN KEYS to DDL
Creating empty cdm.db database
DONE!

barabo avatar May 28 '24 03:05 barabo

Thanks for your script, saved me a lot of effort. I extended your script to replace tabs with spaces:

#!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script.  It will modify the sql files and create an empty
# database called cdm.db
#

set -e
set -o pipefail
set -u


# @thisismexp: replace \t with spaces
d="$(mktemp -d)"
for filename in *.sql; do
    expand -t 4 "$filename" > "$d"/"$filename"
done
mv "$d"/* .
rmdir "$d"


# There should be only one DDL file in the local directory.
DDL=$( ls OMOPCDM*_ddl.sql )
[ ! -e ${DDL} ] && echo "Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1


# Awk scriptlet to add a PRIMARY KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
#   'table' is padded with spaces
#   'column' begins with a tab and ends with a space
#
AWK_PK='
/^CREATE TABLE / {
  if ($0 ~ table) { target=1 } else { target=0 }
}

{
  if (target && $0 ~ column && (!($0 ~ / PRIMARY KEY,/))) {
    sub(/,/, " PRIMARY KEY,");
  }
  print $0
}
'

# Awk scriptlet to add a FOREIGN KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
#   'table' is padded with spaces
#   'column' begins with a tab and ends with a space
#   'ref' is multiple words long
#
AWK_FK='
/^CREATE TABLE / {
  if ($0 ~ table) { target=1 } else { target=0 }
}

{
  if (target && $0 ~ column && (!($0 ~ / REFERENCES /))) {
    if ($0 ~ /,$/) {
      sub(/,$/, " REFERENCES " ref ",");
    } else {
      sub(/ );$/, " REFERENCES " ref ");");
    }
  }
  print $0
}
'

##
# Gets the table names and PK columns from constraints.
#
function get_pks() {
  grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
    | sed -e 's:.* \(.*\) ADD CONSTRAINT .* PRIMARY KEY .\(.*\).;:\1 \2:'
}


##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_pk() {

  # Apply the change to the ddl file content, saving into a temp file.
  cat ${DDL} | awk -v table=" ${1} " -v column="        ${2} " "${AWK_PK}" \
    > ${DDL}.tmp

  # diff the files to assert diff count is 1 changed line only.
  (( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1

  # Apply the temp file to the DDL file.
  mv ${DDL}.tmp ${DDL}
}


function get_fks() {
  grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
    | sed -e 's:ALTER TABLE \(.*\) ADD CONSTRAINT .* FOREIGN KEY .\([^ ]*\). REFERENCES:\1 \2:' \
    | sed -e 's:;::'
}


##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_fk() {

  # Apply the change to the ddl file content, saving into a temp file.
  cat ${DDL} \
    | awk -v table=" ${1} " -v column=" ${2} " -v ref="${3}" "${AWK_FK}" \
    > ${DDL}.tmp

  # diff the files to assert diff count is 1 changed line only.
  (( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1

  # Apply the temp file to the DDL file.
  mv ${DDL}.tmp ${DDL}
}


# Remove the template schema reference, which isn't supported in sqlite.
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e

# Update the table definitions to insert the PRIMARY KEY declarations.
echo "Adding PRIMARY KEYS to DDL"
get_pks | while read table column; do
  mark_pk ${table} ${column}
done

# Insert the FK constraints into the table.
echo "Adding FOREIGN KEYS to DDL"
get_fks | while read table column reference; do
  mark_fk ${table} ${column} "${reference}"
done

# HACK: no PK in the cohort table, so this is needed to satisfy FK checks.
mark_pk cohort cohort_definition_id

# Populate the DB tables and indices.
echo "Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < ${DDL}
sqlite3 cdm.db < OMOPCDM*_indices.sql
sqlite3 cdm.db "pragma foreign_key_check"

# Success!
echo "DONE!"```

</details>

thisismexp avatar Jun 07 '24 11:06 thisismexp

This is at an early stage of development- but would be grateful for any feedback - but this might help efforts with SQLite. See https://github.com/wardle/iort - it can now dynamically generate and execute DDL and handle the vocab import and is working for SQLite and PostgreSQL. I need to add customisation to choose different databases in a similar way to schema in PostgreSQL for example, but might help?

wardle avatar Jun 09 '24 07:06 wardle

For those on this thread, or anyone who lands here. I'm working with a sqlite cdm database for a demo, which uses an experimental new jq module library to quickly map FHIR resources into cdm records.

Demo: https://github.com/barabo/fhir-to-omop-demo fhir-jq: https://github.com/barabo/fhir-jq

Both are still WIP, but this is what motivated me to update the DDL like this.

barabo avatar Jun 09 '24 21:06 barabo

Thanks to @barabo and working with ChatGPT, I updated the DDL script here to be a bit more robust:

#!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script. It will modify the sql files and create an empty
# database called cdm.db
#

set -euo pipefail

# Find the DDL file
DDL=$(ls OMOPCDM*_ddl.sql 2>/dev/null || true)
[ -z "${DDL}" ] && echo "โŒ Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1

##
# Get the table and primary key columns from *_primary_keys.sql
##
function get_pks() {
  grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
    | sed -E 's:.*ALTER TABLE ([^ ]+) ADD CONSTRAINT [^ ]+ PRIMARY KEY \(([^)]+)\);:\1 \2:'
}

##
# Get the table, column, and reference from *_constraints.sql
##
function get_fks() {
  grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
    | sed -E 's:ALTER TABLE ([^ ]+) ADD CONSTRAINT [^ ]+ FOREIGN KEY \(([^)]+)\) REFERENCES ([^;]+);:\1 \2 \3:'
}

##
# Insert PRIMARY KEY modifier into the column definition
##
function mark_pk() {
  echo "๐Ÿ”ง Marking PRIMARY KEY on $1.$2"
  awk -v table="$1" -v column="$2" '
    BEGIN { target = 0 }
    /^[[:space:]]*CREATE TABLE[[:space:]]+/ {
      if ($0 ~ "CREATE TABLE[[:space:]]*" table "[[:space:]]*\\(") {
        target = 1
      } else {
        target = 0
      }
    }
    {
      if (target && $1 == column && $0 !~ /PRIMARY KEY/) {
        sub(/,/, " PRIMARY KEY,")
      }
      print
    }
  ' "${DDL}" > "${DDL}.tmp"

  if ! cmp -s "${DDL}" "${DDL}.tmp"; then
    mv "${DDL}.tmp" "${DDL}"
  else
    echo "โš ๏ธ Skipping $1.$2 โ€” already has PRIMARY KEY or column not found."
    rm -f "${DDL}.tmp"
  fi
}

##
# Insert FOREIGN KEY constraint into the column definition
##
function mark_fk() {
  echo "๐Ÿ”ง Marking FOREIGN KEY on $1.$2 โ†’ $3"
  awk -v table="$1" -v column="$2" -v ref="$3" '
    BEGIN { target = 0 }
    /^[[:space:]]*CREATE TABLE[[:space:]]+/ {
      if ($0 ~ "CREATE TABLE[[:space:]]*" table "[[:space:]]*\\(") {
        target = 1
      } else {
        target = 0
      }
    }
    {
      if (target && $1 == column && $0 !~ /REFERENCES/) {
        if ($0 ~ /,$/) {
          sub(/,$/, " REFERENCES " ref ",")
        } else {
          sub(/[[:space:]]*\);$/, " REFERENCES " ref ");")
        }
      }
      print
    }
  ' "${DDL}" > "${DDL}.tmp"

  if ! cmp -s "${DDL}" "${DDL}.tmp"; then
    mv "${DDL}.tmp" "${DDL}"
  else
    echo "โš ๏ธ Skipping $1.$2 โ€” already has FOREIGN KEY or column not found."
    rm -f "${DDL}.tmp"
  fi
}

# Cleanup the schema placeholder
echo "๐Ÿงน Removing schema references"
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e

# Insert PRIMARY KEYS
echo "๐Ÿ”‘ Adding PRIMARY KEYS to DDL"
get_pks | while read -r table column; do
  mark_pk "$table" "$column"
done

# Insert one known missing PK for sqlite foreign key enforcement
echo "๐Ÿ”ง Manually adding missing PK for 'cohort.cohort_definition_id'"
mark_pk cohort cohort_definition_id

# Insert FOREIGN KEYS
echo "๐Ÿ”— Adding FOREIGN KEYS to DDL"
get_fks | while read -r table column reference; do
  mark_fk "$table" "$column" "$reference"
done

# Create database
echo "๐Ÿ“ฆ Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < "${DDL}"
sqlite3 cdm.db < OMOPCDM_sqlite_5.4_indices.sql

# Check foreign key constraints
echo "๐Ÿ” Checking foreign key integrity"
sqlite3 cdm.db "PRAGMA foreign_key_check"

echo "โœ… DONE!"

This adds a bit more logging information as well as creates a SQLite database with enforced Primary and Foreign Key relations. Thanks folks!

TheCedarPrince avatar May 16 '25 17:05 TheCedarPrince