cobrix
cobrix copied to clipboard
Parsing Multisegment Dataset with is_record_sequence fails
Background [Optional]
I am parsing a multisegment file with variable record length for each segment. Here's env, library, & code fragment (which I modified to make it generic):
Env: Azure Databricks Library: za.co.absa.cobrix:cobol-parser:1.1.2; za.co.absa.cobrix:spark-cobol:1.1.2
splitDF = spark.read.format("cobol")
.option("copybook", copyBook )
.option("schema_retention_policy","collapse_root")
.option("segment_field", "SEG-CODE")
.option("redefine_segment_id_map:1", "ROOT-DATA => RT00")
.option("redefine_segment_id_map:2", "CHILD1-DATA => SEG01")
.option("redefine_segment_id_map:3", "CHILD2-DATA => SEG02")
.option("redefine_segment_id_map:4", "CHILD3-DATA => SEG03")
.option("redefine_segment_id_map:5", "CHILD4-DATA => SEG04")
.option("redefine_segment_id_map:6", "CHILD5-DATA => SEG05")
.option("redefine_segment_id_map:7", "CHILD6-DATA => SEG06")
.option("redefine_segment_id_map:8", "CHILD7-DATA => SEG07")
.option("segment-children:1", "ROOT-DATA => CHILD1-DATA")
.option("segment-children:2", "ROOT-DATA => CHILD2-DATA")
.option("segment-children:3", "ROOT-DATA => CHILD3-DATA")
.option("segment-children:4", "ROOT-DATA => CHILD4-DATA")
.option("segment-children:5", "ROOT-DATA => CHILD5-DATA")
.option("segment-children:6", "ROOT-DATA => CHILD6-DATA")
.option("segment-children:7", "ROOT-DATA => CHILD7-DATA")
.option("is_rdw_big_endian", "true")
.option("variable_size_occurs", "true")
.option("ebcdic_code_page", "cp037_extended")
.option("is_record_sequence", "true")
.load(rawdatafile)
Question
Using .option("is_record_sequence", "true") causes an error: "RDW headers should never be zero (0,0,0,0). Found zero size record at xxxxx." The position varies depending on the option.
However, the file parses when I do not use the "is_record_sequence" though the results aren't accurate. That is, the first record parses correctly and some of the later records parse but not all of them.
Can you please let me know if I am using the option incorrectly? What options should I use to handle this file?
Thanks in advance, Steve