cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

Cobrix returning hexadecimal value in different format (qb)

Open chsnarayana opened this issue 2 years ago • 5 comments

Hi All,

When I am reading hexadecimal data from Cobol file using Cobrix, the output is in a different format. I have tried to cast it using Spark SQL and Pyspark. But there is no use.

The data type defined in copy book is PIC X(06). In most cases, it is getting converted to "qb" with one or 2 spaces after that. Can anyone please help me with this? Whereas Abinitio reads the same data as "qbXXX". Some characters after "QB". But in Abinitio they were able to cast that value to hexadecimal.

As the data we are reading is highly sensitive, we could not able to get it from our customer. So I couldn't able to share the file here.

Thanks in advance, Narayana

chsnarayana avatar May 23 '23 19:05 chsnarayana

There could be several reasons for such behavior. Possibly, the data is in a different code page (not EBCDIC common). You can use .option("debug", "hex") to see raw bytes of each field to debug the decoding process.

Please, provide an example value and corresponding value in '_debug' field. I know it is sensitive, hope a single QBxxxx number is not

yruslan avatar May 24 '23 06:05 yruslan

Hi Ruslan,

    Thank you for your quick reply. I have used the debug option. Then the output is as follows 

"column":"qb","column_debug":"988200007600" .

chsnarayana avatar May 24 '23 12:05 chsnarayana

So EBCDIC 98 is q, 82 is b, but 00 nor 76 do not correspond to any character in EBCDIC common encoding (https://en.wikipedia.org/wiki/EBCDIC).

What output you expect for this column? What abinitio shows for this field and this record?

yruslan avatar May 24 '23 12:05 yruslan

Hi Ruslan,

I got some information from the customer.

Ab initio reading data "988200007600" in the following format: "qb\x00\x00\x00\x00"

Then they use some function to cast it back to 988200000000

Transformation used: (decimal(12)) reinterpret_as(packed decimal(12,stripped), <ColumnName> )

chsnarayana avatar Jun 05 '23 11:06 chsnarayana

I see. One workaround that you can use is: .option("binary_as_hex", "true") and make the field:

PIC X(06) COMP.

This will only work with the latest Cobrix (2.6.8)

yruslan avatar Jun 07 '23 09:06 yruslan