cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

use in databricks

Open DivSaru opened this issue 5 years ago • 8 comments

Hi,

#It's a question , not an issue.

I need to process a mainframe file in azure databricks , which has certain comp-3 values as well. I have the copy book in cobol for the layout of the schema.

I could not find any reference on how to use this in databricks using pyspark python 3. Can you please provide a sample code , on how do we integrate/use cobrix in azure databricks.

A prompt reply would be appreciated. Regards, Divya

DivSaru avatar Jan 16 '20 15:01 DivSaru

Hi, thanks for the interesting question.

Ideally, it should work like this:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

yruslan avatar Jan 16 '20 18:01 yruslan

This is exactly what I'm doing. I have had no problems with pyspark.

On Thu, Jan 16, 2020, 13:49 Ruslan Yushchenko [email protected] wrote:

Hi, thanks for the interesting question.

Ideally, it should work like this:

from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()

df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/236?email_source=notifications&email_token=AAJ6T2PRAU7PAUSGTLALSYLQ6CT2VA5CNFSM4KHWAD62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFD5TQ#issuecomment-575291086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6T2LEHKKX37IIZULCEU3Q6CT2VANCNFSM4KHWAD6Q .

tr11 avatar Jan 16 '20 18:01 tr11

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

All I did to use pyspark was to add the correct jars (spark-cobol, cobol-parser, and scodec) to my Spark jars. After that, loading as @yruslan suggested should work fine.

tr11 avatar Jan 16 '20 22:01 tr11

@tr11 As i'm new to databricks , can you please guide me on how to add these jars, from where can i get this ? what are the steps for adding this in databricks. I'll really appreciate your help and guidance in this.

Regards, Divya

DivSaru avatar Jan 17 '20 09:01 DivSaru

I don't use databricks do I can't try it, but this seems promising:

https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel

tr11 avatar Jan 17 '20 11:01 tr11

Hi, I need to read a schema in cobol copybook using python in AWS. Is there any suggestions.

poornimavithanage avatar Dec 01 '21 05:12 poornimavithanage

You can get Spark schema if you have a DataFrame the same way as in Scala:

df.schema

or

df.schema.treeString

or

df.printSchema

You can get COBOL schema as an AST like this:

val copybook = CopybookParser.parseTree(copyBookContents)
copybook1.generateRecordLayoutPositions

yruslan avatar Dec 01 '21 11:12 yruslan

for databricks its much simpler . All you need to do is go to open your cluster for installing libraries option and install cobrix either by passing the jar maven file verison : za.co.absa.cobrix:spark-cobol_2.12:2.5.1 or by downloading the jar from maven and uploading it to cluster.

psb2509 avatar Sep 24 '22 05:09 psb2509