pyconcrete icon indicating copy to clipboard operation
pyconcrete copied to clipboard

pyconcrete for submitting spark job

Open albertusk95 opened this issue 5 years ago • 3 comments
trafficstars

Hi,

I recently used pyconcrete to obfuscate pyspark codes. To run a spark job on a cluster, we need to use spark-submit command. So it would look like spark-submit job.py.

The concern here is that spark-submit seems to only accept .py extension in order for it to work. Since pyconcrete generates .pye files, I didn't find any way to run the encrypted files via spark-submit.

Is there a way to run encrypted files generated by pyconcrete with spark-submit?

Thank you.

albertusk95 avatar Dec 04 '19 10:12 albertusk95

pyconcrete need binary .so, does spark-submit package your source code and upload to cloud for running? if yes, you need cross-compile pyconcrete.so first. And then you could run pyconcrete as library, try to build your code as .egg, spark seems allow you submit .egg, maybe it should work. Give it a shot.

Falldog avatar Dec 05 '19 15:12 Falldog

Already tried build code as .egg along with the driver program. But spark couldn't find the main class.

It seems that .egg files are only used as dependencies. spark-submit still needs the driver code in .py. So it would look like this: spark-submit --py-files path/to/file.egg driver.py.

According to the doc itself,

For Python applications, simply pass a .py file in the place of <application-jar> instead of a JAR, 
and add Python .zip, .egg or .py files to the search path with --py-files.

albertusk95 avatar Dec 06 '19 04:12 albertusk95

Can you provide more information? Maybe it's spark-sumit issue, not pyconcrete.

Falldog avatar Feb 17 '22 04:02 Falldog