getting header length 73728 != 8192
Following code
from sas7bdat import SAS7BDAT
with SAS7BDAT('10rec.sas7bdat') as f:
df2 = f.to_data_frame()
produces
[10rec.sas7bdat] header length 73728 != 8192 [10rec.sas7bdat] column count mismatch
This file has 14,461 columns but just 10 rows. We're dealing with very wide datasets normally. Is this a known issue?
Here's link to that wide sas7bdat file: https://github.com/Tagar/dropbox/blob/master/10rec.sas7bdat
I can't say without looking at the file in more detail, but this could be due to a file that is encrypted or compressed (neither of which the sas7bdat package can handle). Also, you could try the Parso package or the R package 'haven', which might work. It looks like you are using a Python wrapper to the sas7bdat package.
On Tue, Oct 24, 2017 at 8:31 PM, Ruslan Dautkhanov <[email protected]
wrote:
Here's link to that wide sas7bdat file: https://github.com/Tagar/ dropbox/blob/master/10rec.sas7bdat
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BioStatMatt/sas7bdat/issues/12#issuecomment-339185667, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqMupXOPgjCZYfMhEaXFZegVGalcx4_ks5svo9ogaJpZM4QFTwn .
Yes, file is compressed but not encrypted.
Yes, we tried parso 2.0.7 and it reads compressed files fine. Any plans to add support for compressed files in BioStatMatt/sas7bdat ? Thank you.
I would love to, but it's not a priority for me right now. What I'd prefer to do, which has not been done yet AFAIK, it to complete the file format documentation with respect to compressed files. While parso does read them, their authors have not published a format document like the one in the (my) sas7bdat package. Someone will need to read through their code and translate to the documentation...
On Wed, Oct 25, 2017 at 1:53 PM, Ruslan Dautkhanov <[email protected]
wrote:
Yes, we tried parso 2.0.7 and it reads compressed files fine. Any plans to add support for compressed files in BioStatMatt/sas7bdat ? Thank you.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BioStatMatt/sas7bdat/issues/12#issuecomment-339433779, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqMupksSgwaUf1M7roHm0RD1TTjjRa-ks5sv4ObgaJpZM4QFTwn .
thanks for prompt response @BioStatMatt. What you're saying totally makes sense. I just got an email from a colleague of mine:
I was just looking into the 10 record dataset that I sent you… and found out that SAS did NOT apply any internal compression on that dataset. Likely because it is so few records?
So I lied earlier when I said there was internal compression.
So I have to take back my words that this file was compressed. It is not compressed and still BioStatMatt/sas7bdat can't read it. Any ideas? Is it because this dataset is so wide? 14 thousand columns.