cobrix
cobrix copied to clipboard
Variable length record parsing
Background [Optional]
I have multi layout file with certain order. I group each layout and last layout is repeated multiple times.
each layout is fixed length file(200 bytes).
Example: 1.FH 2.LH 3.BH1 4.DE1 5.AD : : 10.DE2 11.AD : 13.BH2 14.DE1 15.AD : :
I rearrange layouts like FH,LH,BH1,DE1,{4 bytes count of AD}Array[AD] FH,LH,BH1,DE2,{4 bytes count of AD}Array[AD] FH,LH,BH2,DE1,{4 bytes count of AD}Array[AD] FH,LH,BH2,DE2,{4 bytes count of AD}Array[AD]
Question
How do I parse RDD[Array[Bytes]] using framework?
‘’’ import za.co.absa.cobrix.spark.cobol.Cobrix
val rdd = ??? val df = Cobrix.fromRdd .copybookContents(copybook) .option("encoding", "ebcdic") // any supported option .load(rdd)
‘’’
Sorry, I'm not sure I understand your question.
- The segment re-arrangement is quite a common pattern. We have even a feature request to support it directly in Cobrix (https://github.com/AbsaOSS/cobrix/issues/369). But for now, usually we use segment id to segment field mapping to read all segments, and then use Spark's windowing functions to re-arrange segments. But as I understand you've already doe this step.
- Parsing an
RDD[Array[Byte]]is done exactly as you specified in your question.
Hi @yruslan
I re-arrangement data like this ways, FH,LH,BH1,DE1,{4 bytes count of AD}Array[AD] -> variable.
import za.co.absa.cobrix.spark.cobol.Cobrix
val rdd = ???
val df = Cobrix.fromRdd
.copybookContents(copybook)
.option("encoding", "ebcdic") // any supported option
.load(rdd)
this is not working and master copybook looks like below.
01 RECORD-MASTER.
02 FILLER PIC X(200).
02 FH-LAYOUT REDEFINES FILLER.
02 RCD-ID PIC X(2)
:
02 RCD-SQC PIC 9(9)v.
02 LH-LAYOUT REDEFINES FILLER.
02 RCD-ID PIC X(2)
:
02 RCD-SQC PIC 9(9)v.
How do we access redefine layout FH only from master layout?
def adjustRows(itr:Iterator[Array[Byte]],layouts:Map[String,Copybook]):Iterator[Seq[Seq[Any]]={
var fh=Seq[Any]()
var lh=Seq[Any]()
var bh=Seq[Any]()
var de=Seq[Any]()
var ad=ListBuffer[Seq[Any]]()
var startStatus=true
val dataset=ListBuffer[Seq[Seq[Any]]]()
while(itr.hasNext){
val record:Array[Byte]=itr.next()
val header :String=record.slice(0,2).map(byte=>ebcdicToAsciiMapping((byte+256)%256)).mkString
header match{
case "FH"= >
fh= getRowString(record,layouts[header])
case "LH"= >
lh =>getRowString(record,layouts[header])
case "BH"= >
bh =>getRowString(record,layouts[header])
case "DE"= > {
if(startStatus){
de= getRowString(record,layouts[header])
startStatus=false
}else{
dataset.append(fh++lh++bh++de++ad.foldLeft(Seq[Any]())(acc,seqData)=>(acc++seqData))
ad.clear
de= getRowString(record,layouts[header])
}
}
case "AD"= >
ad.append(getRowString(record,layouts[header]))
case _ => throw new Exception("Unknow")
}
here I am passing induvial layout, can I merge all Induvial layouts and access particular layout based on header ?
def getRow(arrayOfBytes:Array[Byte],copyBook:Copybook):Seq[Any]={
val handler =new StructHandler()
RecordExtractors.extractRecord[copybook.ast,arrayOfBytes,0,handler =handler]
}