cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

Parsing Multisegment Dataset with is_record_sequence fails

Open srow2u opened this issue 5 years ago • 0 comments

Background [Optional]

I am parsing a multisegment file with variable record length for each segment. Here's env, library, & code fragment (which I modified to make it generic):

Env: Azure Databricks Library: za.co.absa.cobrix:cobol-parser:1.1.2; za.co.absa.cobrix:spark-cobol:1.1.2

splitDF = spark.read.format("cobol")
.option("copybook", copyBook )
.option("schema_retention_policy","collapse_root")
.option("segment_field", "SEG-CODE")
.option("redefine_segment_id_map:1", "ROOT-DATA => RT00")
.option("redefine_segment_id_map:2", "CHILD1-DATA => SEG01")
.option("redefine_segment_id_map:3", "CHILD2-DATA => SEG02")
.option("redefine_segment_id_map:4", "CHILD3-DATA => SEG03")
.option("redefine_segment_id_map:5", "CHILD4-DATA => SEG04")
.option("redefine_segment_id_map:6", "CHILD5-DATA => SEG05")
.option("redefine_segment_id_map:7", "CHILD6-DATA => SEG06")
.option("redefine_segment_id_map:8", "CHILD7-DATA => SEG07")
.option("segment-children:1", "ROOT-DATA => CHILD1-DATA")
.option("segment-children:2", "ROOT-DATA => CHILD2-DATA")
.option("segment-children:3", "ROOT-DATA => CHILD3-DATA")
.option("segment-children:4", "ROOT-DATA => CHILD4-DATA")
.option("segment-children:5", "ROOT-DATA => CHILD5-DATA")
.option("segment-children:6", "ROOT-DATA => CHILD6-DATA")
.option("segment-children:7", "ROOT-DATA => CHILD7-DATA")
.option("is_rdw_big_endian", "true")
.option("variable_size_occurs", "true")
.option("ebcdic_code_page", "cp037_extended")
.option("is_record_sequence", "true")
.load(rawdatafile)

Question

Using .option("is_record_sequence", "true") causes an error: "RDW headers should never be zero (0,0,0,0). Found zero size record at xxxxx." The position varies depending on the option.

However, the file parses when I do not use the "is_record_sequence" though the results aren't accurate. That is, the first record parses correctly and some of the later records parse but not all of them.

Can you please let me know if I am using the option incorrectly? What options should I use to handle this file?

Thanks in advance, Steve

srow2u avatar May 31 '20 03:05 srow2u