Athena copied to clipboard
CPT4 ULMS API process causing insertion of carriage returns in "CONCEPT.csv"
I'm presently having to clean the final concept.csv prior to insertion into a postgres database following the insertion of CPT4 codes via the cpt.bat process that is described upon downloading the vocabulary from Athena.
PostgreSQL (run in psql CLI, ):
\copy omop.concept FROM '\path\to\modified\concept.csv' WITH (FORMAT CSV, DELIMITER E'\t', QUOTE E'\b', ENCODING 'UTF8', HEADER TRUE)
Query returns:
ERROR: unquoted carriage return found in data HINT: Use quoted CSV field to represent carriage return.
System and File information
Included datafile with 4 error examples: See attached. Note that ULMS CPT4 codes being pulled down requires a license. I provide 4 error examples with Concept name and Concept code redacted so as to ensure that I haven't created any kind of license infringements by providing this document. The OMOP information provided by Odysseus, including Concept_ID's, remain.
OMOP Vocabulary version: v5.0 23-JAN-23
Java info: Version 8 Update 361 (build 1.8.0_361-b09)
Target Database version: PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
System info: Processor 12th Gen Intel(R) Core(TM) i7-1270P 2.20 GHz Installed RAM 32.0 GB (31.4 GB usable) System type 64-bit operating system, x64-based processor
Windows Info: Edition Windows 10 Enterprise Version 21H2 Installed on 7/20/2022 OS build 19044.2846 Experience Windows Feature Experience Pack 120.2212.4190.0
@odikia - Daniel, this is odd... let me double check if we have changed anything recently about the cpt4.jar.
@odikia - looks as if we have been doing this for a while now. I can confirm that it seems that all rows for CPT4 in the concept.csv after reconstitution end with a CRLF instead of only a LF. Did you always update your vocabularies in the same way and if so, when was the last time that you were able to do so without an error?