parso icon indicating copy to clipboard operation
parso copied to clipboard

java.io.IOException: No available bytes in the input stream

Open wessankey opened this issue 5 years ago • 7 comments

I'm running into the following exception when attempting to process a file:

java.io.IOException: There are no available bytes in the input stream.
  at com.epam.parso.impl.SasFileParser.getBytesFromFile(SasFileParser.java:768)
  at com.epam.parso.impl.SasFileParser.readSubheaderSignature(SasFileParser.java:423)
  at com.epam.parso.impl.SasFileParser.processPageMetadata(SasFileParser.java:392)
  at com.epam.parso.impl.SasFileParser.processNextPage(SasFileParser.java:591)
  at com.epam.parso.impl.SasFileParser.readNextPage(SasFileParser.java:561)
  at com.epam.parso.impl.SasFileParser.readNext(SasFileParser.java:519)
  at com.epam.parso.impl.SasFileReaderImpl.readNext(SasFileReaderImpl.java:168)
  ... 57 elided

The error occurs after processing approximately 400,000 rows, and the file has several million. My code is below:

import java.io.FileInputStream
import com.epam.parso.impl.SasFileReaderImpl

val sasFileReader = new SasFileReaderImpl(new FileInputStream("test.sas7bdat"))
int numRows = sasFileReader.getSasFileProperties().getRowCount()
int currentRowNum = 0

while (currentRowNum < numRows) {
    val currentRow = sasFileReader.readNext()
    currentRow.foreach(c => print(c + "|"))
    currentRowNum += 1
}

Environment details: I'm running this on an EMR cluster with Scala 2.11.

wessankey avatar Jul 26 '19 13:07 wessankey

Hi @westonsankey, thank you for reporting this. Is there any way to provide us the source file? Thanks.

Yana-Guseva avatar Jul 26 '19 15:07 Yana-Guseva

@Yana-Guseva - I am unable to provide the SAS source file.

I created an equivalent program in Java and got the same exception.

wessankey avatar Jul 29 '19 13:07 wessankey

Please tell me which version of Parso are you using? Do you know whether this file was created using the SAS platform and that there are definitely no errors in it?

It seems that this file contains an offset value that goes beyond the page boundaries for one of subheaders.

Yana-Guseva avatar Jul 29 '19 15:07 Yana-Guseva

I've tested using versions 2.0.9, 2.0.10, and 2.0.11. I was able to successfully parse the file using the Python sas7bdat library. Not entirely sure what you mean by the offset value going beyond the page boundaries for one of the subheaders, as I don't have much experience with the SAS format.

wessankey avatar Aug 01 '19 16:08 wessankey

in my case it is happening with SASYZCR2 compression type

saurabhvermaabd98 avatar Feb 19 '20 09:02 saurabhvermaabd98

@saurabhvermaabd98 can you please provide a test file with this issue?

PCaff avatar Feb 19 '20 11:02 PCaff

@westonsankey @saurabhvermaabd98 -- could you please try with parso 2.0.12 as your file might have contained deleted rows, and it was improved in 2.0.12. Also if you have the test file, that would be very helpful if you could share it with us. In the meanwhile I will put the "nodataset" label as it's pretty hard to impossible for us to fix the issue without dataset.

printsev avatar Nov 30 '20 07:11 printsev