Athena CPT4 ULMS API process causing insertion of carriage returns in "CONCEPT.csv"

CPT4 ULMS API process causing insertion of carriage returns in "CONCEPT.csv"

Open odikia opened this issue 1 year ago • 2 comments

I'm presently having to clean the final concept.csv prior to insertion into a postgres database following the insertion of CPT4 codes via the cpt.bat process that is described upon downloading the vocabulary from Athena.

PostgreSQL (run in psql CLI, ):

\copy omop.concept FROM '\path\to\modified\concept.csv' WITH (FORMAT CSV, DELIMITER E'\t', QUOTE E'\b', ENCODING 'UTF8', HEADER TRUE)

Query returns:

ERROR: unquoted carriage return found in data HINT: Use quoted CSV field to represent carriage return.

System and File information

Included datafile with 4 error examples: See attached. Note that ULMS CPT4 codes being pulled down requires a license. I provide 4 error examples with Concept name and Concept code redacted so as to ensure that I haven't created any kind of license infringements by providing this document. The OMOP information provided by Odysseus, including Concept_ID's, remain.

OMOP Vocabulary version: v5.0 23-JAN-23

Java info: Version 8 Update 361 (build 1.8.0_361-b09)

Target Database version: PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit

System info: Processor 12th Gen Intel(R) Core(TM) i7-1270P 2.20 GHz Installed RAM 32.0 GB (31.4 GB usable) System type 64-bit operating system, x64-based processor

Windows Info: Edition Windows 10 Enterprise Version 21H2 Installed on ‎7/‎20/‎2022 OS build 19044.2846 Experience Windows Feature Experience Pack 120.2212.4190.0

CONCEPT_first_4_cpt4_errors.csv

May 05 '23 17:05 odikia

@odikia - Daniel, this is odd... let me double check if we have changed anything recently about the cpt4.jar.

May 10 '23 08:05 mik-ohdsi

@odikia - looks as if we have been doing this for a while now. I can confirm that it seems that all rows for CPT4 in the concept.csv after reconstitution end with a CRLF instead of only a LF. Did you always update your vocabularies in the same way and if so, when was the last time that you were able to do so without an error?

May 10 '23 15:05 mik-ohdsi

Athena Athena copied to clipboard

CPT4 ULMS API process causing insertion of carriage returns in "CONCEPT.csv"

System and File information

Athena
Athena copied to clipboard