cobrix
cobrix copied to clipboard
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Background The goal of the process is to process an ebcdic mainframe data file with copybook and load it into Azure Data Lake Gen 2 in a readable text file...
## Background Spark data sources, like `csv`, create a column containing bogus records it was unable to parse. Currently, `spark-cobol` just ignores all decoding errors and writes `null` to the...
## Background Currently, `scodec` library is used only to implement the decoding of IEEE754 floating-point numbers. Having this dependency complicates `spark-cobol` parser usage from Spark Shell. ## Feature Implement decoding...
## Background Currently, the hierarchical records generator uses a lot of auxiliary arrays, maps, etc. Maybe if the parser could specify child segments for segment redefines instead of setting parent...
## Background Currently, all documentation on Cobrix is in README as a very long page. It is hard to navigate. Adding some structure to it would make it better. ##...
## Background Currently, Cobrix copyright headers contain year ranges, e.g. `2018-2019`. According to @GeorgiChochov, > `sbt headerCheck` doesn't work with `2018-2019` in the license header, expected only `2018` and I...
Example : 0120156788PKumar Pndey 05201789654rDtr467788999000009988777666 05201789654ABCD467788999000009988777666 06201789654rDtr46778899900000998877766698765444ffghjjjj 088888997544332245t6yuuiiiiiiiiiiiiiiiiiiiiiiiffffffffffffffffffffffffffffffffffffffffffffffgggggggggggggggg ----------------------------- Here 01 records is having 2 05 records which needs to be collected as a single column.
## Background Currently, each unit test suite that creates a DataFrame uses its own methods for comparing layouts, schemas and data. ## Feature Extract layout, schema and data comparison methods...
## Background `CopybookParser` has a lot of private methods that can be generalized. ## Task Extract generic enough methods from `CopybookParser`, make them public, put them into a generic utils...
## Background `CopybookParser` is too big and becomes harder to maintain over time. This is because the object contains every post-parsing method it applies to the parsed AST. Example: ```...