cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

Getting BDW header error convert Ebcdic file to ascii

Open arifkhan09 opened this issue 2 years ago • 6 comments

I have a use-case where i have a variable block Cobol file and need to parse the file in ASCII format using the Cobrix dependencies. I am using below Cobrix dependency jar file. spark-cobol_2.12-2.6.3.jar cobol-parse_2.12-2.6.3.jar scodec-core_2.12-1.10.3.jar scodec-bite_2.12-1.1.4.jar antlr-runtime-4.8.ar

But i am getting error- BDW headers contain non-zero values where zeros are expected(check rdw_big_endian flag. Header:125,149,0,0, offset:0..

Even i have tried other option like - rdw_adjustment -4 , is_bdw_big_endian is true and is_rdw_big_endian is true but getting same header error.

So how header is generated in the ebcdic file and what could be the solution of this issue

Below is my data frame to read the file

val df = spark .read .format("cobol")
.option("copybook_contents", copybook)
.option("record_format", "VB")
.option("copybook", "copybook") .option("is_bdw_big_endian", "false") .option("is_rdw_big_endian", "false") .option("schema_retention_policy", "collapse_root") .load("data")

arifkhan09 avatar Mar 16 '23 09:03 arifkhan09

Looks like your BDW headers are big endian (first 2 bytes are non zero, the rest 2 bytes are zero) .option("is_bdw_big_endian", "true")

Here is the spec: https://www.ibm.com/docs/en/zos/2.2.0?topic=records-block-descriptor-word-bdw

yruslan avatar Mar 16 '23 15:03 yruslan

When i have used below option getting error- BDW headers contain non-zero values where zeros are expected(check 'rdw_big_endian' flag. Header:0,1,3,89,0, offset:33984..

.option("is_bdw_big_endian", "true")

arifkhan09 avatar Mar 16 '23 15:03 arifkhan09

This is is what the adjustments can probably fix: .option("bdw_adjustment", -4)

Notice the offset was 0 initially, meaning the the first record headers couldn't be parsed. Now the offset is 33984, meaning that issues start at one of next blocks.

Sometimes, BDW headers include the header itself into the total block size, and sometimes - they don't.

Usually I use a Hex editor to figure out how headers reflect record size, and then apply the adjustments.

yruslan avatar Mar 16 '23 15:03 yruslan

I am keep getting error- BDW headers contain non-zero values where zeros are expected(check rdw_big_endian flag. Header:122,13,10,0, offset:4184569 below option used .option("record_format", "VB") .option("copybook", "copybook") .option("is_bdw_big_endian", "true") .option("is_rdw_big_endian", "true") .option("bdw_adjustment", -4) .option("rdw_adjustment", -4) .option("ebcdic_code_page", "cp037")

Is header will change for each different input file. As in production run input file change so in that case how handle this scenario and how make give the adjustment.

arifkhan09 avatar Mar 30 '23 12:03 arifkhan09

Hi @arifkhan09 , it is really hard to say at this point without looking at the file.

Are you sure your file has the variable block (VB) format?

Could you send first 10 bytes of the file, from a HEX viewer?

yruslan avatar Mar 30 '23 13:03 yruslan

Hi, i have the similar issue. Could ypu please let me know how to resolve the error.

Kavya1552 avatar Mar 26 '24 13:03 Kavya1552