cobrix EBCDIC Data Columns showing null for PIC 9(5) COMP-3

## Describe the bug Here are the Cobol copy book and ebcdic data files. I try to read using below statement. I was able to print the schema, however I have following 3 issues.

Not able to print CUST_ID , its showing as null. In the sample copyboock CUST_ID is PIC 9(5) COMP-3
Not able to parse other columns as per the copy book column fields length.
There are total 3 records in data file. but when we display its only showing single record. However I used data stage I was able to parse and see the data as expected using same copy book and data.

Code:

val df = spark.read.format("cobol").option("schema_retention_policy", "collapse_root").option("is_record_sequence", true).option("copybook", "/myapp/new/sample_mainframe_fb_data_copybook.cob.txt").load("/myapp/new/sample_mainframe_fb_data.dat.txt") sample_mainframe_fb_data_copybook.cob.txt sample_mainframe_fb_data.dat.txt

To Reproduce

Steps to reproduce the behaviour OR commands run:

Copy the Cobol and data files to your file system
Use the above code snippet in any of your running spark/scala program or spark-shell for interactive execution.

Expected behaviour

A clear and concise description of what you expected to happen.

It should list all 3 records. 2. CUST_ID value able to readable, 3. View and align the other data columns as per the Cobol copybook

Screenshots

If applicable, add screenshots to help explain your problem.

Results are showing like below

+-------+------------+--------------------+------------------+--------+--------------------+------------+-----------+--------------------+

|CUST_ID| CUST_NAME| STREET_ADDRESS| CITY|US_STATE|OTHER_STATE_PROVINCE|COUNTRY_CODE|POSTAL_CODE| NOTES| +-------+------------+--------------------+------------------+--------+--------------------+------------+-----------+--------------------+ | null|[John, 1 Ma]|in St. Anyt|own NY| | USA1| 234| 5 This|is our first cust...| +-------+------------+--------------------+------------------+--------+--------------------+------------+-----------+--------------------+

Additional context

Add any other context about the problem here.

Jul 09 '20 21:07 syerragunta

Hi, I think your example file does not contain RDWs.

Please, try to remove .option("is_record_sequence", true).

You can use .option("debug", "true") to see what is being decoded.

Jul 20 '20 14:07 yruslan

@yruslan I am getting the same issue... Can you please share the code ?

Oct 05 '22 22:10 diddyp20

Hi, I can take a look if you have a small example copybook and a data file. COMP-3 decodes to null when the value is not a valid BCD-encoded number. You can investigate raw field contents using .option("debug", "true")

Oct 06 '22 06:10 yruslan

Here is the copybook extract:

012900 01  DUCTL-CYCLE-REC  REDEFINES DUCTL-DETAIL-REC.                 01290000
013000     05  DUCTL-CYCLE-MSG                     PIC  X(12).          01300000
013100     05  DUCTL-CYCLE-CURR-CYCLEDT            PIC  X(7).           01310000
013200     05  DUCTL-CYCLE-CURR-CYCLE              PIC  X(5).           01320000
013300     05  DUCTL-CYCLE-LAST-CYCLE              PIC  X(5).           01330000
013400     05  FILLER                              PIC  X(162).         01340005
013500     05  DUCTL-CYCLE-SEQNUM                  PIC X(15).           01350000
013600     SKIP2                                                        01360000
013700 01  DUCTL-CNTL-REC   REDEFINES  DUCTL-CYCLE-REC.                 01370000
013800     05  DUCTL-CNTL-MSG                      PIC  X(12).          01380000
013900     05  DUCTL-CNTL-CURR-CYCLEDT             PIC  X(7).           01390000
014000     05  DUCTL-CNTL-CURR-CYCLE               PIC  X(5).           01400000
014100     05  DUCTL-CNTL-LAST-CYCLE               PIC  X(5).           01410000
014200     05  DUCTL-CNTL-TOTAL-REC-COUNT          PIC  9(9)   COMP-3.  01420000
014300     05  DUCTL-CNTL-TOTAL-PREMIUM            PIC S9(11)  COMP-3.  01430000
014400     05  DUCTL-CNTL-TOTAL-NONPREM            PIC S9(9)V99 COMP-3. 01440000
014500     05  DUCTL-CNTL-OMNI-TOT-REC-CT          PIC  9(9)   COMP-3.  01450000
014600     05  DUCTL-CNTL-OMNI-TOT-PREM            PIC S9(11)  COMP-3.  01460000
014700     05  DUCTL-CNTL-OMNI-TOT-NONPREM         PIC S9(9)V99 COMP-3. 01470000
014701     05  DUCTL-CNTL-TOT-DROPREC              PIC  9(10).          01470102
014702     05  DUCTL-CNTL-TOT-TRIGPREM             PIC S9(11).          01470202
014703     05  DUCTL-CNTL-TOT-TRIGREC              PIC  9(09).          01470302
014704     05  DUCTL-CNTL-TOT-BINDOFF              PIC  9(11).          01470402
014310     05  FILLER                              PIC  X(12).          01431000
014710     05  DUCTL-CNTL-TOTAL-COMM               PIC S9(9)V99 COMP-3. 01471000
014720     05  DUCTL-CNTL-TOTAL-NPCOMM             PIC S9(9)V99 COMP-3. 01472000
014600     05  FILLER                              PIC  X(63).          01460000
014900     05  DUCTL-CNTL-SEQNUM                   PIC  X(15).          01490000

    class_poc_df = spark.read.format("cobol")\
        .option("copybook",class_copybook)\
        .option("encoding", "ascii")\
        .option("debug", "true")\
        .option("is_text", "true")\
        .option("schema_retention_policy", "collapse_root")\
        .load(class_data)

Below is a screenshot of the dataframe

+--------------------------+--------------------------------+------------------------------+------------------------------------+
|DUCTL_DETAIL_TOTAL_PREMIUM|DUCTL_DETAIL_TOTAL_PREMIUM_debug|DUCTL_DETAIL_PRIOR_POL_PREMIUM|DUCTL_DETAIL_PRIOR_POL_PREMIUM_debug|
+--------------------------+--------------------------------+------------------------------+------------------------------------+
|                      null|                    434F46200000|                          null|                        004546433034|
+--------------------------+--------------------------------+------------------------------+------------------------------------+

Oct 06 '22 13:10 diddyp20

As you can see

434F46200000

is not a valid BCD number. 'F' cannot be in the middle of a BCD number. Maybe your copybook is shifted against the data. If you have values of parsed data (maybe in Mainframe console), you can quickly identify the shift by comparing parsed values between the mainframe and cobirx.

Oct 06 '22 14:10 yruslan