cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

Variable length record parsing

Open sree018 opened this issue 2 years ago • 2 comments

Background [Optional]

I have multi layout file with certain order. I group each layout and last layout is repeated multiple times.

each layout is fixed length file(200 bytes).

Example: 1.FH 2.LH 3.BH1 4.DE1 5.AD : : 10.DE2 11.AD : 13.BH2 14.DE1 15.AD : :

I rearrange layouts like FH,LH,BH1,DE1,{4 bytes count of AD}Array[AD] FH,LH,BH1,DE2,{4 bytes count of AD}Array[AD] FH,LH,BH2,DE1,{4 bytes count of AD}Array[AD] FH,LH,BH2,DE2,{4 bytes count of AD}Array[AD]

Question

How do I parse RDD[Array[Bytes]] using framework?

‘’’ import za.co.absa.cobrix.spark.cobol.Cobrix

val rdd = ??? val df = Cobrix.fromRdd .copybookContents(copybook) .option("encoding", "ebcdic") // any supported option .load(rdd)

‘’’

sree018 avatar Apr 17 '23 03:04 sree018

Sorry, I'm not sure I understand your question.

  1. The segment re-arrangement is quite a common pattern. We have even a feature request to support it directly in Cobrix (https://github.com/AbsaOSS/cobrix/issues/369). But for now, usually we use segment id to segment field mapping to read all segments, and then use Spark's windowing functions to re-arrange segments. But as I understand you've already doe this step.
  2. Parsing an RDD[Array[Byte]] is done exactly as you specified in your question.

yruslan avatar Apr 17 '23 07:04 yruslan

Hi @yruslan

I re-arrangement data like this ways, FH,LH,BH1,DE1,{4 bytes count of AD}Array[AD] -> variable.

import za.co.absa.cobrix.spark.cobol.Cobrix

val rdd = ???
val df = Cobrix.fromRdd
.copybookContents(copybook)
.option("encoding", "ebcdic") // any supported option
.load(rdd)

this is not working and master copybook looks like below.

       01  RECORD-MASTER.
             02 FILLER            PIC X(200).
             02 FH-LAYOUT  REDEFINES FILLER.
                    02   RCD-ID            PIC X(2)
                            :  
                     02  RCD-SQC         PIC 9(9)v.
              02 LH-LAYOUT  REDEFINES FILLER. 
                     02   RCD-ID            PIC X(2)
                            :  
                     02  RCD-SQC         PIC 9(9)v.

How do we access redefine layout FH only from master layout?

       def adjustRows(itr:Iterator[Array[Byte]],layouts:Map[String,Copybook]):Iterator[Seq[Seq[Any]]={
              var fh=Seq[Any]()
               var lh=Seq[Any]()
               var bh=Seq[Any]()
               var de=Seq[Any]()
               var ad=ListBuffer[Seq[Any]]()
               var startStatus=true
              val dataset=ListBuffer[Seq[Seq[Any]]]()
        while(itr.hasNext){
          val record:Array[Byte]=itr.next()
             val header :String=record.slice(0,2).map(byte=>ebcdicToAsciiMapping((byte+256)%256)).mkString
            header match{
                 case "FH"= >
                             fh= getRowString(record,layouts[header])
                 case "LH"= > 
                            lh =>getRowString(record,layouts[header])
                 case "BH"= > 
                            bh =>getRowString(record,layouts[header])
                  case "DE"= > {
                             if(startStatus){
                                 de= getRowString(record,layouts[header])
                                  startStatus=false
                              }else{
                                 dataset.append(fh++lh++bh++de++ad.foldLeft(Seq[Any]())(acc,seqData)=>(acc++seqData))
                                  ad.clear
                                de= getRowString(record,layouts[header])
                            }
                    }
                  case "AD"= > 
                              ad.append(getRowString(record,layouts[header]))
                 case _ => throw new Exception("Unknow")
       }

here I am passing induvial layout, can I merge all Induvial layouts and access particular layout based on header ?

    def getRow(arrayOfBytes:Array[Byte],copyBook:Copybook):Seq[Any]={
      val handler =new StructHandler()
      RecordExtractors.extractRecord[copybook.ast,arrayOfBytes,0,handler =handler]
    }

sree018 avatar Mar 26 '24 21:03 sree018