xport icon indicating copy to clipboard operation
xport copied to clipboard

Add support for CPORT (compressed XPORT format)

Open selik opened this issue 8 years ago • 14 comments

It seems some archaic FDA submission rules require(d) SAS XPT or CPT-format files. The Aggregate Analysis of ClinicalTrials.gov Database hosts the same data in Oracle "dmp", pipe-delimited text, and SAS CPORT formats. Perhaps we can use these files as a sort of Rosetta stone to infer the specification of the SAS CPT/CPORT format.

selik avatar Oct 22 '16 10:10 selik

Hi, Great Contribution. I need to know some information about CPORT, Currently my client requirement is to read .cpt format files in python. But I'm unable to find the layout format for CPORT like https://support.sas.com/techsup/technote/ts140.pdf(XPORT format). Did you found any information about CPORT format ?. Or any help needed on this ?

  • Thanks in advance.

dhanababum avatar Sep 15 '17 18:09 dhanababum

@dhanababum I believe it stands for "Compressed export" or something like that. Unfortunately, we'd have to reverse-engineer it.

The binary CPORT format is not openly documented. The data values in files produced by PROC CPORT can be compressed and the files may be password-protected.

https://www.loc.gov/preservation/digital/formats/fdd/fdd000464.shtml#notes

selik avatar Sep 16 '17 23:09 selik

Hi, Appreciate all the work on this. I'm also running into an issue opening compressed transport files. Any luck with using Python for CPORT files?

smiiil avatar Jan 11 '21 14:01 smiiil

@smiiil Sorry, I haven't gotten around to it, and I don't expect to for a while. I'm happy to coach you through it, though.

selik avatar Jan 11 '21 18:01 selik

Sure, willing to help.

smiiil avatar Jan 11 '21 18:01 smiiil

My design idea was that the cport module could extend classes from the v56 module, trying to reuse as much of the logic as possible.

Unfortunately, there seem to be some bugs in the latest version, so maybe it's best to start by fixing those, which'd get you familiar with the logic. The decision to extend Pandas made the code much more complex. Hopefully it made the API more pleasant, but I've started to worry that it was a mistake.

selik avatar Jan 11 '21 18:01 selik

Thanks for all of your work @selik! Following this thread since I also am running into the issue with CPORT files.

cmdugan13 avatar Jan 18 '22 21:01 cmdugan13

@cmdugan13 Is the CPORT file you're trying to read publicly available?

selik avatar Jan 19 '22 02:01 selik

It is-- I can't link the file, but it's C2419P1M.XPORT in the attached folder 2021 Midyear-Final-Model Software.zip It is on CMS's website, if you need the source

cmdugan13 avatar Jan 19 '22 02:01 cmdugan13

I'll take a look next weekend / late January.

selik avatar Jan 19 '22 15:01 selik

This is going to be tricky. SAS Universal Viewer doesn't support CPORT files. Apparently the universe is smaller than we thought. https://support.sas.com/kb/42/356.html

selik avatar Feb 14 '22 19:02 selik

I found some sample datasets that CMS published in 2014 that are available both as TRN and as TXT files if it helps: STDIAG.TRN STDIAG.TXT

lscott15 avatar Apr 21 '22 19:04 lscott15

@lscott15 Thanks for the tip. I'll check it out.

selik avatar May 12 '22 04:05 selik

Was there any progress made on this? I am also willing to help.

thekevshow avatar May 18 '23 02:05 thekevshow