cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

Cobol program using MS SQL Server as backend data base

Open jcstrydom opened this issue 4 years ago • 5 comments

Background

I am currently working on a project where a Cobol based system is using a MS SQL Server instance as its back end.

I am able to connect to the SQL server database via a JDBC connection which returns the table into a Spark Dataframe, however it is still encoded with EBCDIC encoding, which is an obvious problem when using AWS GLUE and wanting to post the data into parquet files for down stream processes. I am also able to parse the copybook via your copybook parser.

However, these two structures are vastly different, which are posing limitations to the process that I would like to build. I would still want to use your package as I believe there are inherent synergies.

Question

There are a few questions:

  1. Is there any advise that you can give me with regards my use case and using your package?
  2. Is there a way that I can just use your decoding technology while in flight, or after the data has landed in the dataframe?
  3. Is there a way to flatten the schema structure once the parser has completed?

Your assistance would be greatly appreciated.

jcstrydom avatar Dec 01 '21 11:12 jcstrydom

Regarding parsing of individual fields, this example maybe helpful to you:

import za.co.absa.cobrix.cobol.parser.CopybookParser
import za.co.absa.cobrix.cobol.parser.ast.{Group, Primitive}

val copybookContents =
  """       01  RECORD.
       05  A1       PIC X(5).
       05  A2       PIC 9(4).
"""

val dataForA1 = Array(
  0xD9, 0xF2, 0xC4, 0xF2, 0x40, // R2D2
).map(_.toByte)

val dataForA2 = Array(
  0xF1, 0xF2, 0xF3, 0xF4,       // 1234
).map(_.toByte)

// Get a copybook object from the copybook text
val copybook = CopybookParser.parse(copybookContents)

// AST is s tree-like structure that consists of instances of 'Statement' interfaces. A statement can be either Primitive or Group.
// - A primitive is a concrete field - the leaf of the tree.
// - A group is a struct field that can have subfields
// An occurs can also be defined for a statement, then it becomes an array

// AST is a Group. The only child is 'RECORD' according to the copybook
val record = copybook.ast.children(0).asInstanceOf[Group]

// A1 and A2 are the first and the second child of the record
val a1 = record.children(0).asInstanceOf[Primitive]
val a2 = record.children(1).asInstanceOf[Primitive]

println(s"The first field is: ${a1.name}")
println(s"The second field is: ${a2.name}")

val decoderForA1 = a1.decode
val decoderForA2 = a2.decode

val decodedA1 = decoderForA1(dataForA1).asInstanceOf[String]
val decodedA2 = decoderForA2(dataForA2).asInstanceOf[Int]

println(decodedA1)
println(decodedA2)

It outputs:

The first field is: A1
The second field is: A2
R2D2
1234

yruslan avatar Dec 02 '21 11:12 yruslan

Regarding schema flattening, if you have a nested dataframe, you can convert it to a flat dataframe using SparkUtils.flattenSchema(df)

Examples are in http://github.com/AbsaOSS/cobrix/blob/ec600f549e00ec3cfd4025353bfeec78acf7b532/spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtilsSuite.scala#L81-L81

yruslan avatar Dec 02 '21 11:12 yruslan

Btw, The above code can be simplified. I've written it like that to emphasize that decoder is a function returned from 'decode()'

So

    val decoderForA1 = a1.decode
    val decoderForA2 = a2.decode

    val decodedA1 = decoderForA1(dataForA1).asInstanceOf[String]
    val decodedA2 = decoderForA2(dataForA2).asInstanceOf[Int]

can be written as

    val decodedA1 = a1.decode(dataForA1).asInstanceOf[String]
    val decodedA2 = a2.decode(dataForA2).asInstanceOf[Int]

yruslan avatar Dec 02 '21 12:12 yruslan

Hi Ruslan

Thank you very much for this detail feedback.

I will definitely take a look at this.

There is still so much to learn around COBOL and your package.

Thanks for the interaction today and if there are any other questions I know we will get quick response from you guys.

Kind regards Johan

On Thu, 02 Dec 2021, 14:03 Ruslan Yushchenko, @.***> wrote:

Btw, The above code can be simplified. I've written it like that to emphasize that decoder is a function returned from 'decode()'

So

val decoderForA1 = a1.decode
val decoderForA2 = a2.decode

val decodedA1 = decoderForA1(dataForA1).asInstanceOf[String]
val decodedA2 = decoderForA2(dataForA2).asInstanceOf[Int]

can be written as

val decodedA1 = a1.decode(dataForA1).asInstanceOf[String]
val decodedA2 = a2.decode(dataForA2).asInstanceOf[Int]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/445#issuecomment-984564847, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI6QT2CD5NJXZWR2HRECSZ3UO5OBNANCNFSM5JEIIBCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jcstrydom avatar Dec 02 '21 16:12 jcstrydom

You are welcome!

yruslan avatar Dec 03 '21 07:12 yruslan