cobrix
cobrix copied to clipboard
use in databricks
Hi,
#It's a question , not an issue.
I need to process a mainframe file in azure databricks , which has certain comp-3 values as well. I have the copy book in cobol for the layout of the schema.
I could not find any reference on how to use this in databricks using pyspark python 3. Can you please provide a sample code , on how do we integrate/use cobrix in azure databricks.
A prompt reply would be appreciated. Regards, Divya
Hi, thanks for the interesting question.
Ideally, it should work like this:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')
The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.
This is exactly what I'm doing. I have had no problems with pyspark.
On Thu, Jan 16, 2020, 13:49 Ruslan Yushchenko [email protected] wrote:
Hi, thanks for the interesting question.
Ideally, it should work like this:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()
df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')
The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/236?email_source=notifications&email_token=AAJ6T2PRAU7PAUSGTLALSYLQ6CT2VA5CNFSM4KHWAD62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFD5TQ#issuecomment-575291086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6T2LEHKKX37IIZULCEU3Q6CT2VANCNFSM4KHWAD6Q .
The only thing I'm not sure is how to provide the dependency to
spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.
All I did to use pyspark was to add the correct jars (spark-cobol, cobol-parser, and scodec) to my Spark jars. After that, loading as @yruslan suggested should work fine.
@tr11 As i'm new to databricks , can you please guide me on how to add these jars, from where can i get this ? what are the steps for adding this in databricks. I'll really appreciate your help and guidance in this.
Regards, Divya
I don't use databricks do I can't try it, but this seems promising:
https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel
Hi, I need to read a schema in cobol copybook using python in AWS. Is there any suggestions.
You can get Spark schema if you have a DataFrame the same way as in Scala:
df.schema
or
df.schema.treeString
or
df.printSchema
You can get COBOL schema as an AST like this:
val copybook = CopybookParser.parseTree(copyBookContents)
copybook1.generateRecordLayoutPositions
for databricks its much simpler . All you need to do is go to open your cluster for installing libraries option and install cobrix either by passing the jar maven file verison : za.co.absa.cobrix:spark-cobol_2.12:2.5.1 or by downloading the jar from maven and uploading it to cluster.