parso
parso copied to clipboard
java.io.IOException: No available bytes in the input stream
I'm running into the following exception when attempting to process a file:
java.io.IOException: There are no available bytes in the input stream.
at com.epam.parso.impl.SasFileParser.getBytesFromFile(SasFileParser.java:768)
at com.epam.parso.impl.SasFileParser.readSubheaderSignature(SasFileParser.java:423)
at com.epam.parso.impl.SasFileParser.processPageMetadata(SasFileParser.java:392)
at com.epam.parso.impl.SasFileParser.processNextPage(SasFileParser.java:591)
at com.epam.parso.impl.SasFileParser.readNextPage(SasFileParser.java:561)
at com.epam.parso.impl.SasFileParser.readNext(SasFileParser.java:519)
at com.epam.parso.impl.SasFileReaderImpl.readNext(SasFileReaderImpl.java:168)
... 57 elided
The error occurs after processing approximately 400,000 rows, and the file has several million. My code is below:
import java.io.FileInputStream
import com.epam.parso.impl.SasFileReaderImpl
val sasFileReader = new SasFileReaderImpl(new FileInputStream("test.sas7bdat"))
int numRows = sasFileReader.getSasFileProperties().getRowCount()
int currentRowNum = 0
while (currentRowNum < numRows) {
val currentRow = sasFileReader.readNext()
currentRow.foreach(c => print(c + "|"))
currentRowNum += 1
}
Environment details: I'm running this on an EMR cluster with Scala 2.11.
Hi @westonsankey, thank you for reporting this. Is there any way to provide us the source file? Thanks.
@Yana-Guseva - I am unable to provide the SAS source file.
I created an equivalent program in Java and got the same exception.
Please tell me which version of Parso are you using? Do you know whether this file was created using the SAS platform and that there are definitely no errors in it?
It seems that this file contains an offset value that goes beyond the page boundaries for one of subheaders.
I've tested using versions 2.0.9, 2.0.10, and 2.0.11. I was able to successfully parse the file using the Python sas7bdat library. Not entirely sure what you mean by the offset value going beyond the page boundaries for one of the subheaders, as I don't have much experience with the SAS format.
in my case it is happening with SASYZCR2 compression type
@saurabhvermaabd98 can you please provide a test file with this issue?
@westonsankey @saurabhvermaabd98 -- could you please try with parso 2.0.12 as your file might have contained deleted rows, and it was improved in 2.0.12. Also if you have the test file, that would be very helpful if you could share it with us. In the meanwhile I will put the "nodataset" label as it's pretty hard to impossible for us to fix the issue without dataset.