Instructions for using sqlite DDLs for non-R users
Hi there!
I've been playing around trying to get the DDLs loaded for sqlite3, version 5.4 using the sqlite3 command line and I've been having a pretty rough go of it using the provided definitions.
I didn't find any documentation here or elsewhere about the @cdmDatabaseSchema notation in the ddl files, so I went ahead and removed them. I assume they are something used by the sqlrender utility - but I'm not an R user - I'm just trying to create a working OMOPCDM database in sqlite.
I also discovered that sqlite does not allow you to apply primary or foreign keys to tables using the ALTER TABLE commands provided. So, I wrote a brittle little script to apply the PK and FK definitions to the main _ddl.sql file.
This allows me to use the patched _ddl.sql file to define tables with PKs and FKs. The _indices.sql file seems fine as-is.
I did notice that the COHORT table is missing a PRIMARY KEY, so the foreign key check fails in a clean install for just that table. It almost seems like COHORT should have cohort_id integer PRIMARY KEY and COHORT_DEFINITION should have cohort_definition_id integer PRIMARY KEY - and then the COHORT table should have a FK on COHORT_DEFINITION.cohort_definition_id (instead, it seems backwards). But as it is now, if I make COHORT.cohort_definition_id the PK of the COHORT table - it gets me past the failing foreign key checks. Anyway, I expect some of that will change in future versions - my script includes a hack for this.
For posterity, I'm providing my script here. However, are there instructions for people who don't want to use R but just want to create a working DB out there somewhere?
Click to see my script
#!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script. It will modify the sql files and create an empty
# database called cdm.db
#
set -e
set -o pipefail
set -u
# There should be only one DDL file in the local directory.
DDL=$( ls OMOPCDM*_ddl.sql )
[ ! -e ${DDL} ] && echo "Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1
# Awk scriptlet to add a PRIMARY KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
# 'table' is padded with spaces
# 'column' begins with a tab and ends with a space
#
AWK_PK='
/^CREATE TABLE / {
if ($0 ~ table) { target=1 } else { target=0 }
}
{
if (target && $0 ~ column && (!($0 ~ / PRIMARY KEY,/))) {
sub(/,/, " PRIMARY KEY,");
}
print $0
}
'
# Awk scriptlet to add a FOREIGN KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
# 'table' is padded with spaces
# 'column' begins with a tab and ends with a space
# 'ref' is multiple words long
#
AWK_FK='
/^CREATE TABLE / {
if ($0 ~ table) { target=1 } else { target=0 }
}
{
if (target && $0 ~ column && (!($0 ~ / REFERENCES /))) {
if ($0 ~ /,$/) {
sub(/,$/, " REFERENCES " ref ",");
} else {
sub(/ );$/, " REFERENCES " ref ");");
}
}
print $0
}
'
##
# Gets the table names and PK columns from constraints.
#
function get_pks() {
grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
| sed -e 's:.* \(.*\) ADD CONSTRAINT .* PRIMARY KEY .\(.*\).;:\1 \2:'
}
##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_pk() {
# Apply the change to the ddl file content, saving into a temp file.
cat ${DDL} | awk -v table=" ${1} " -v column=" ${2} " "${AWK_PK}" \
> ${DDL}.tmp
# diff the files to assert diff count is 1 changed line only.
(( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1
# Apply the temp file to the DDL file.
mv ${DDL}.tmp ${DDL}
}
function get_fks() {
grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
| sed -e 's:ALTER TABLE \(.*\) ADD CONSTRAINT .* FOREIGN KEY .\([^ ]*\). REFERENCES:\1 \2:' \
| sed -e 's:;::'
}
##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_fk() {
# Apply the change to the ddl file content, saving into a temp file.
cat ${DDL} \
| awk -v table=" ${1} " -v column=" ${2} " -v ref="${3}" "${AWK_FK}" \
> ${DDL}.tmp
# diff the files to assert diff count is 1 changed line only.
(( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1
# Apply the temp file to the DDL file.
mv ${DDL}.tmp ${DDL}
}
# Remove the template schema reference, which isn't supported in sqlite.
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e
# Update the table definitions to insert the PRIMARY KEY declarations.
echo "Adding PRIMARY KEYS to DDL"
get_pks | while read table column; do
mark_pk ${table} ${column}
done
# Insert the FK constraints into the table.
echo "Adding FOREIGN KEYS to DDL"
get_fks | while read table column reference; do
mark_fk ${table} ${column} "${reference}"
done
# HACK: no PK in the cohort table, so this is needed to satisfy FK checks.
mark_pk cohort cohort_definition_id
# Populate the DB tables and indices.
echo "Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < ${DDL}
sqlite3 cdm.db < OMOPCDM*_indices.sql
sqlite3 cdm.db "pragma foreign_key_check"
# Success!
echo "DONE!"
This updates the _ddl.sql file to look like this:
CREATE TABLE person (
person_id integer NOT NULL PRIMARY KEY,
gender_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID),
year_of_birth integer NOT NULL,
month_of_birth integer NULL,
day_of_birth integer NULL,
birth_datetime REAL NULL,
race_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID),
ethnicity_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID),
location_id integer NULL REFERENCES LOCATION (LOCATION_ID),
provider_id integer NULL REFERENCES PROVIDER (PROVIDER_ID),
care_site_id integer NULL REFERENCES CARE_SITE (CARE_SITE_ID),
person_source_value TEXT NULL,
gender_source_value TEXT NULL,
gender_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID),
race_source_value TEXT NULL,
race_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID),
ethnicity_source_value TEXT NULL,
ethnicity_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID));
When the script runs correctly, you see output like this:
(base) anderson.carl3@home test % ./update-ddl.sh
Adding PRIMARY KEYS to DDL
Adding FOREIGN KEYS to DDL
Creating empty cdm.db database
DONE!
Thanks for your script, saved me a lot of effort. I extended your script to replace tabs with spaces:
#!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script. It will modify the sql files and create an empty
# database called cdm.db
#
set -e
set -o pipefail
set -u
# @thisismexp: replace \t with spaces
d="$(mktemp -d)"
for filename in *.sql; do
expand -t 4 "$filename" > "$d"/"$filename"
done
mv "$d"/* .
rmdir "$d"
# There should be only one DDL file in the local directory.
DDL=$( ls OMOPCDM*_ddl.sql )
[ ! -e ${DDL} ] && echo "Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1
# Awk scriptlet to add a PRIMARY KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
# 'table' is padded with spaces
# 'column' begins with a tab and ends with a space
#
AWK_PK='
/^CREATE TABLE / {
if ($0 ~ table) { target=1 } else { target=0 }
}
{
if (target && $0 ~ column && (!($0 ~ / PRIMARY KEY,/))) {
sub(/,/, " PRIMARY KEY,");
}
print $0
}
'
# Awk scriptlet to add a FOREIGN KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
# 'table' is padded with spaces
# 'column' begins with a tab and ends with a space
# 'ref' is multiple words long
#
AWK_FK='
/^CREATE TABLE / {
if ($0 ~ table) { target=1 } else { target=0 }
}
{
if (target && $0 ~ column && (!($0 ~ / REFERENCES /))) {
if ($0 ~ /,$/) {
sub(/,$/, " REFERENCES " ref ",");
} else {
sub(/ );$/, " REFERENCES " ref ");");
}
}
print $0
}
'
##
# Gets the table names and PK columns from constraints.
#
function get_pks() {
grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
| sed -e 's:.* \(.*\) ADD CONSTRAINT .* PRIMARY KEY .\(.*\).;:\1 \2:'
}
##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_pk() {
# Apply the change to the ddl file content, saving into a temp file.
cat ${DDL} | awk -v table=" ${1} " -v column=" ${2} " "${AWK_PK}" \
> ${DDL}.tmp
# diff the files to assert diff count is 1 changed line only.
(( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1
# Apply the temp file to the DDL file.
mv ${DDL}.tmp ${DDL}
}
function get_fks() {
grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
| sed -e 's:ALTER TABLE \(.*\) ADD CONSTRAINT .* FOREIGN KEY .\([^ ]*\). REFERENCES:\1 \2:' \
| sed -e 's:;::'
}
##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_fk() {
# Apply the change to the ddl file content, saving into a temp file.
cat ${DDL} \
| awk -v table=" ${1} " -v column=" ${2} " -v ref="${3}" "${AWK_FK}" \
> ${DDL}.tmp
# diff the files to assert diff count is 1 changed line only.
(( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1
# Apply the temp file to the DDL file.
mv ${DDL}.tmp ${DDL}
}
# Remove the template schema reference, which isn't supported in sqlite.
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e
# Update the table definitions to insert the PRIMARY KEY declarations.
echo "Adding PRIMARY KEYS to DDL"
get_pks | while read table column; do
mark_pk ${table} ${column}
done
# Insert the FK constraints into the table.
echo "Adding FOREIGN KEYS to DDL"
get_fks | while read table column reference; do
mark_fk ${table} ${column} "${reference}"
done
# HACK: no PK in the cohort table, so this is needed to satisfy FK checks.
mark_pk cohort cohort_definition_id
# Populate the DB tables and indices.
echo "Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < ${DDL}
sqlite3 cdm.db < OMOPCDM*_indices.sql
sqlite3 cdm.db "pragma foreign_key_check"
# Success!
echo "DONE!"```
</details>
This is at an early stage of development- but would be grateful for any feedback - but this might help efforts with SQLite. See https://github.com/wardle/iort - it can now dynamically generate and execute DDL and handle the vocab import and is working for SQLite and PostgreSQL. I need to add customisation to choose different databases in a similar way to schema in PostgreSQL for example, but might help?
For those on this thread, or anyone who lands here. I'm working with a sqlite cdm database for a demo, which uses an experimental new jq module library to quickly map FHIR resources into cdm records.
Demo: https://github.com/barabo/fhir-to-omop-demo
fhir-jq: https://github.com/barabo/fhir-jq
Both are still WIP, but this is what motivated me to update the DDL like this.
Thanks to @barabo and working with ChatGPT, I updated the DDL script here to be a bit more robust:
#!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script. It will modify the sql files and create an empty
# database called cdm.db
#
set -euo pipefail
# Find the DDL file
DDL=$(ls OMOPCDM*_ddl.sql 2>/dev/null || true)
[ -z "${DDL}" ] && echo "โ Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1
##
# Get the table and primary key columns from *_primary_keys.sql
##
function get_pks() {
grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
| sed -E 's:.*ALTER TABLE ([^ ]+) ADD CONSTRAINT [^ ]+ PRIMARY KEY \(([^)]+)\);:\1 \2:'
}
##
# Get the table, column, and reference from *_constraints.sql
##
function get_fks() {
grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
| sed -E 's:ALTER TABLE ([^ ]+) ADD CONSTRAINT [^ ]+ FOREIGN KEY \(([^)]+)\) REFERENCES ([^;]+);:\1 \2 \3:'
}
##
# Insert PRIMARY KEY modifier into the column definition
##
function mark_pk() {
echo "๐ง Marking PRIMARY KEY on $1.$2"
awk -v table="$1" -v column="$2" '
BEGIN { target = 0 }
/^[[:space:]]*CREATE TABLE[[:space:]]+/ {
if ($0 ~ "CREATE TABLE[[:space:]]*" table "[[:space:]]*\\(") {
target = 1
} else {
target = 0
}
}
{
if (target && $1 == column && $0 !~ /PRIMARY KEY/) {
sub(/,/, " PRIMARY KEY,")
}
print
}
' "${DDL}" > "${DDL}.tmp"
if ! cmp -s "${DDL}" "${DDL}.tmp"; then
mv "${DDL}.tmp" "${DDL}"
else
echo "โ ๏ธ Skipping $1.$2 โ already has PRIMARY KEY or column not found."
rm -f "${DDL}.tmp"
fi
}
##
# Insert FOREIGN KEY constraint into the column definition
##
function mark_fk() {
echo "๐ง Marking FOREIGN KEY on $1.$2 โ $3"
awk -v table="$1" -v column="$2" -v ref="$3" '
BEGIN { target = 0 }
/^[[:space:]]*CREATE TABLE[[:space:]]+/ {
if ($0 ~ "CREATE TABLE[[:space:]]*" table "[[:space:]]*\\(") {
target = 1
} else {
target = 0
}
}
{
if (target && $1 == column && $0 !~ /REFERENCES/) {
if ($0 ~ /,$/) {
sub(/,$/, " REFERENCES " ref ",")
} else {
sub(/[[:space:]]*\);$/, " REFERENCES " ref ");")
}
}
print
}
' "${DDL}" > "${DDL}.tmp"
if ! cmp -s "${DDL}" "${DDL}.tmp"; then
mv "${DDL}.tmp" "${DDL}"
else
echo "โ ๏ธ Skipping $1.$2 โ already has FOREIGN KEY or column not found."
rm -f "${DDL}.tmp"
fi
}
# Cleanup the schema placeholder
echo "๐งน Removing schema references"
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e
# Insert PRIMARY KEYS
echo "๐ Adding PRIMARY KEYS to DDL"
get_pks | while read -r table column; do
mark_pk "$table" "$column"
done
# Insert one known missing PK for sqlite foreign key enforcement
echo "๐ง Manually adding missing PK for 'cohort.cohort_definition_id'"
mark_pk cohort cohort_definition_id
# Insert FOREIGN KEYS
echo "๐ Adding FOREIGN KEYS to DDL"
get_fks | while read -r table column reference; do
mark_fk "$table" "$column" "$reference"
done
# Create database
echo "๐ฆ Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < "${DDL}"
sqlite3 cdm.db < OMOPCDM_sqlite_5.4_indices.sql
# Check foreign key constraints
echo "๐ Checking foreign key integrity"
sqlite3 cdm.db "PRAGMA foreign_key_check"
echo "โ
DONE!"
This adds a bit more logging information as well as creates a SQLite database with enforced Primary and Foreign Key relations. Thanks folks!