cobrix
cobrix copied to clipboard
Getting BDW header error convert Ebcdic file to ascii
I have a use-case where i have a variable block Cobol file and need to parse the file in ASCII format using the Cobrix dependencies. I am using below Cobrix dependency jar file. spark-cobol_2.12-2.6.3.jar cobol-parse_2.12-2.6.3.jar scodec-core_2.12-1.10.3.jar scodec-bite_2.12-1.1.4.jar antlr-runtime-4.8.ar
But i am getting error- BDW headers contain non-zero values where zeros are expected(check rdw_big_endian flag. Header:125,149,0,0, offset:0..
Even i have tried other option like - rdw_adjustment -4 , is_bdw_big_endian is true and is_rdw_big_endian is true but getting same header error.
So how header is generated in the ebcdic file and what could be the solution of this issue
Below is my data frame to read the file
val df = spark
.read
.format("cobol")
.option("copybook_contents", copybook)
.option("record_format", "VB")
.option("copybook", "copybook")
.option("is_bdw_big_endian", "false")
.option("is_rdw_big_endian", "false")
.option("schema_retention_policy", "collapse_root")
.load("data")
Looks like your BDW headers are big endian (first 2 bytes are non zero, the rest 2 bytes are zero)
.option("is_bdw_big_endian", "true")
Here is the spec: https://www.ibm.com/docs/en/zos/2.2.0?topic=records-block-descriptor-word-bdw
When i have used below option getting error- BDW headers contain non-zero values where zeros are expected(check 'rdw_big_endian' flag. Header:0,1,3,89,0, offset:33984..
.option("is_bdw_big_endian", "true")
This is is what the adjustments can probably fix:
.option("bdw_adjustment", -4)
Notice the offset was 0 initially, meaning the the first record headers couldn't be parsed. Now the offset is 33984, meaning that issues start at one of next blocks.
Sometimes, BDW headers include the header itself into the total block size, and sometimes - they don't.
Usually I use a Hex editor to figure out how headers reflect record size, and then apply the adjustments.
I am keep getting error- BDW headers contain non-zero values where zeros are expected(check rdw_big_endian flag. Header:122,13,10,0, offset:4184569 below option used .option("record_format", "VB") .option("copybook", "copybook") .option("is_bdw_big_endian", "true") .option("is_rdw_big_endian", "true") .option("bdw_adjustment", -4) .option("rdw_adjustment", -4) .option("ebcdic_code_page", "cp037")
Is header will change for each different input file. As in production run input file change so in that case how handle this scenario and how make give the adjustment.
Hi @arifkhan09 , it is really hard to say at this point without looking at the file.
Are you sure your file has the variable block (VB) format?
Could you send first 10 bytes of the file, from a HEX viewer?
Hi, i have the similar issue. Could ypu please let me know how to resolve the error.