sas7bdat icon indicating copy to clipboard operation
sas7bdat copied to clipboard

getting header length 73728 != 8192

Open Tagar opened this issue 8 years ago • 6 comments

Following code

from sas7bdat import SAS7BDAT
with SAS7BDAT('10rec.sas7bdat') as f:
    df2 = f.to_data_frame()

produces

[10rec.sas7bdat] header length 73728 != 8192 [10rec.sas7bdat] column count mismatch

This file has 14,461 columns but just 10 rows. We're dealing with very wide datasets normally. Is this a known issue?

Tagar avatar Oct 25 '17 01:10 Tagar

Here's link to that wide sas7bdat file: https://github.com/Tagar/dropbox/blob/master/10rec.sas7bdat

Tagar avatar Oct 25 '17 01:10 Tagar

I can't say without looking at the file in more detail, but this could be due to a file that is encrypted or compressed (neither of which the sas7bdat package can handle). Also, you could try the Parso package or the R package 'haven', which might work. It looks like you are using a Python wrapper to the sas7bdat package.

On Tue, Oct 24, 2017 at 8:31 PM, Ruslan Dautkhanov <[email protected]

wrote:

Here's link to that wide sas7bdat file: https://github.com/Tagar/ dropbox/blob/master/10rec.sas7bdat

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BioStatMatt/sas7bdat/issues/12#issuecomment-339185667, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqMupXOPgjCZYfMhEaXFZegVGalcx4_ks5svo9ogaJpZM4QFTwn .

BioStatMatt avatar Oct 25 '17 15:10 BioStatMatt

Yes, file is compressed but not encrypted.

Tagar avatar Oct 25 '17 17:10 Tagar

Yes, we tried parso 2.0.7 and it reads compressed files fine. Any plans to add support for compressed files in BioStatMatt/sas7bdat ? Thank you.

Tagar avatar Oct 25 '17 18:10 Tagar

I would love to, but it's not a priority for me right now. What I'd prefer to do, which has not been done yet AFAIK, it to complete the file format documentation with respect to compressed files. While parso does read them, their authors have not published a format document like the one in the (my) sas7bdat package. Someone will need to read through their code and translate to the documentation...

On Wed, Oct 25, 2017 at 1:53 PM, Ruslan Dautkhanov <[email protected]

wrote:

Yes, we tried parso 2.0.7 and it reads compressed files fine. Any plans to add support for compressed files in BioStatMatt/sas7bdat ? Thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BioStatMatt/sas7bdat/issues/12#issuecomment-339433779, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqMupksSgwaUf1M7roHm0RD1TTjjRa-ks5sv4ObgaJpZM4QFTwn .

BioStatMatt avatar Oct 25 '17 20:10 BioStatMatt

thanks for prompt response @BioStatMatt. What you're saying totally makes sense. I just got an email from a colleague of mine:

I was just looking into the 10 record dataset that I sent you… and found out that SAS did NOT apply any internal compression on that dataset. Likely because it is so few records?

So I lied earlier when I said there was internal compression.

So I have to take back my words that this file was compressed. It is not compressed and still BioStatMatt/sas7bdat can't read it. Any ideas? Is it because this dataset is so wide? 14 thousand columns.

Tagar avatar Oct 25 '17 21:10 Tagar