EBCDIC Data Columns showing null for PIC 9(5) COMP-3
## Describe the bug Here are the Cobol copy book and ebcdic data files. I try to read using below statement. I was able to print the schema, however I have following 3 issues.
- Not able to print CUST_ID , its showing as null. In the sample copyboock CUST_ID is PIC 9(5) COMP-3
- Not able to parse other columns as per the copy book column fields length.
- There are total 3 records in data file. but when we display its only showing single record. However I used data stage I was able to parse and see the data as expected using same copy book and data.
Code:
val df = spark.read.format("cobol").option("schema_retention_policy", "collapse_root").option("is_record_sequence", true).option("copybook", "/myapp/new/sample_mainframe_fb_data_copybook.cob.txt").load("/myapp/new/sample_mainframe_fb_data.dat.txt") sample_mainframe_fb_data_copybook.cob.txt sample_mainframe_fb_data.dat.txt
To Reproduce
Steps to reproduce the behaviour OR commands run:
- Copy the Cobol and data files to your file system
- Use the above code snippet in any of your running spark/scala program or spark-shell for interactive execution.
Expected behaviour
A clear and concise description of what you expected to happen.
- It should list all 3 records. 2. CUST_ID value able to readable, 3. View and align the other data columns as per the Cobol copybook
Screenshots
If applicable, add screenshots to help explain your problem.
Results are showing like below
+-------+------------+--------------------+------------------+--------+--------------------+------------+-----------+--------------------+
|CUST_ID| CUST_NAME| STREET_ADDRESS| CITY|US_STATE|OTHER_STATE_PROVINCE|COUNTRY_CODE|POSTAL_CODE| NOTES| +-------+------------+--------------------+------------------+--------+--------------------+------------+-----------+--------------------+ | null|[John, 1 Ma]|in St. Anyt|own NY| | USA1| 234| 5 This|is our first cust...| +-------+------------+--------------------+------------------+--------+--------------------+------------+-----------+--------------------+
Additional context
Add any other context about the problem here.
Hi, I think your example file does not contain RDWs.
Please, try to remove .option("is_record_sequence", true).
You can use .option("debug", "true") to see what is being decoded.
@yruslan I am getting the same issue... Can you please share the code ?
Hi, I can take a look if you have a small example copybook and a data file.
COMP-3 decodes to null when the value is not a valid BCD-encoded number. You can investigate raw field contents using
.option("debug", "true")
Here is the copybook extract:
012900 01 DUCTL-CYCLE-REC REDEFINES DUCTL-DETAIL-REC. 01290000
013000 05 DUCTL-CYCLE-MSG PIC X(12). 01300000
013100 05 DUCTL-CYCLE-CURR-CYCLEDT PIC X(7). 01310000
013200 05 DUCTL-CYCLE-CURR-CYCLE PIC X(5). 01320000
013300 05 DUCTL-CYCLE-LAST-CYCLE PIC X(5). 01330000
013400 05 FILLER PIC X(162). 01340005
013500 05 DUCTL-CYCLE-SEQNUM PIC X(15). 01350000
013600 SKIP2 01360000
013700 01 DUCTL-CNTL-REC REDEFINES DUCTL-CYCLE-REC. 01370000
013800 05 DUCTL-CNTL-MSG PIC X(12). 01380000
013900 05 DUCTL-CNTL-CURR-CYCLEDT PIC X(7). 01390000
014000 05 DUCTL-CNTL-CURR-CYCLE PIC X(5). 01400000
014100 05 DUCTL-CNTL-LAST-CYCLE PIC X(5). 01410000
014200 05 DUCTL-CNTL-TOTAL-REC-COUNT PIC 9(9) COMP-3. 01420000
014300 05 DUCTL-CNTL-TOTAL-PREMIUM PIC S9(11) COMP-3. 01430000
014400 05 DUCTL-CNTL-TOTAL-NONPREM PIC S9(9)V99 COMP-3. 01440000
014500 05 DUCTL-CNTL-OMNI-TOT-REC-CT PIC 9(9) COMP-3. 01450000
014600 05 DUCTL-CNTL-OMNI-TOT-PREM PIC S9(11) COMP-3. 01460000
014700 05 DUCTL-CNTL-OMNI-TOT-NONPREM PIC S9(9)V99 COMP-3. 01470000
014701 05 DUCTL-CNTL-TOT-DROPREC PIC 9(10). 01470102
014702 05 DUCTL-CNTL-TOT-TRIGPREM PIC S9(11). 01470202
014703 05 DUCTL-CNTL-TOT-TRIGREC PIC 9(09). 01470302
014704 05 DUCTL-CNTL-TOT-BINDOFF PIC 9(11). 01470402
014310 05 FILLER PIC X(12). 01431000
014710 05 DUCTL-CNTL-TOTAL-COMM PIC S9(9)V99 COMP-3. 01471000
014720 05 DUCTL-CNTL-TOTAL-NPCOMM PIC S9(9)V99 COMP-3. 01472000
014600 05 FILLER PIC X(63). 01460000
014900 05 DUCTL-CNTL-SEQNUM PIC X(15). 01490000
class_poc_df = spark.read.format("cobol")\
.option("copybook",class_copybook)\
.option("encoding", "ascii")\
.option("debug", "true")\
.option("is_text", "true")\
.option("schema_retention_policy", "collapse_root")\
.load(class_data)
Below is a screenshot of the dataframe
+--------------------------+--------------------------------+------------------------------+------------------------------------+
|DUCTL_DETAIL_TOTAL_PREMIUM|DUCTL_DETAIL_TOTAL_PREMIUM_debug|DUCTL_DETAIL_PRIOR_POL_PREMIUM|DUCTL_DETAIL_PRIOR_POL_PREMIUM_debug|
+--------------------------+--------------------------------+------------------------------+------------------------------------+
| null| 434F46200000| null| 004546433034|
+--------------------------+--------------------------------+------------------------------+------------------------------------+
As you can see
434F46200000
is not a valid BCD number. 'F' cannot be in the middle of a BCD number. Maybe your copybook is shifted against the data. If you have values of parsed data (maybe in Mainframe console), you can quickly identify the shift by comparing parsed values between the mainframe and cobirx.