Achilles
Achilles copied to clipboard
Achilles Generating Incorrect Column Name in SQL Scripts - vo.discharge_to_concept_id
Hi - We're setting up Achilles to run on top of our delta lake through Databricks. I was having issues running the analysis through R, so I generated the SQL scripts and copied them over to Databricks. The generated script failed to complete when referencing the discharge_to_concept_id field because it was missing. In actuality, it looks like it changed from discharge to discharged.
https://ohdsi.github.io/CommonDataModel/cdm54.html#visit_occurrence
We're using OMOP CDM 5.4.1. It looks like this issue was reported last year in May, but has been closed and is still present. https://github.com/OHDSI/Achilles/issues/702
Here's the r script we used to generate the queries. I think it's pretty standard....
# Install Packages
if (!require("remotes")) install.packages("remotes")
remotes::install_github("OHDSI/Achilles", force=TRUE)
install.packages("DatabaseConnector")
library("DatabaseConnector")
# Download Drivers
Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = "c:\\temp")
downloadJdbcDrivers("spark")
# Set Connection
connectionDetails <- createConnectionDetails(
dbms="spark",
connectionString="",
user="token",
password="")
options(connectionObserver = NULL)
library(Achilles)
Achilles::achilles(
cdmVersion = "5.4.1",
connectionDetails = connectionDetails,
cdmDatabaseSchema = "REMOVED",
resultsDatabaseSchema = "REMOVED",
dropScratchTables = TRUE,
createIndices = FALSE,
createTable = TRUE,
numThreads = 10,
sqlOnly = TRUE
)
And an example of the incorrect column reference in the generated SQL.
SELECT
1203 AS analysis_id,
CAST(vo.discharge_to_concept_id AS STRING) AS stratum_1,
CAST(NULL AS STRING) AS stratum_2,
CAST(NULL AS STRING) AS stratum_3,
CAST(NULL AS STRING) AS stratum_4,
CAST(NULL AS STRING) AS stratum_5,
COUNT(*) AS count_value
FROM
hive_metastore.omop.visit_occurrence vo
JOIN
hive_metastore.omop.observation_period op
ON
vo.person_id = op.person_id
AND
vo.visit_start_date >= op.observation_period_start_date
AND
vo.visit_start_date <= op.observation_period_end_date
WHERE
vo.discharge_to_concept_id != 0
GROUP BY
vo.discharge_to_concept_id;
OPTIMIZE hive_metastore.omop.bajhxbnws_tmpach_1203
ZORDER BY stratum_1;
It also generated the following incorrect column reference:
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name admitting_source_concept_id
cannot be resolved. Did you mean one of the following? [visit_source_concept_id
, admitted_from_concept_id
, visit_type_concept_id
, discharged_to_concept_id
, visit_concept_id
].; line 65 pos 197
Assuming admitting_source_concept_id should actually be admitted_from_concept_id.
Should we create a new issue for each or keep a list here?