amazon-sagemaker-local-mode icon indicating copy to clipboard operation
amazon-sagemaker-local-mode copied to clipboard

question: Pyspark Processing Jobs in Local Mode?

Open dcompgriff opened this issue 2 years ago • 1 comments

Hello. I was wondering if there existed a tutorial, or current support for 1) running a pyspark processing job locally and 2) doing so with a custom base docker (EMR) image? I see a tutorial for Dask using a script processor, and also some code for an SKLearn based processor. My goal is to be able to basically set up a local testing/dev environment that uses sagemaker spark processor code. I'm guessing this is more complicated than the other use cases since this processor is usually backed by an EMR cluster.

dcompgriff avatar Sep 01 '22 16:09 dcompgriff

Hi @dcompgriff PySparkProcessor will not work in local mode. This is a SageMaker Docker image and has nothing to do with EMR. You can build your own Spark Docker image, and use ScriptProcessor with it, the same as the Dask example and run it locally.

eitansela avatar Sep 05 '22 07:09 eitansela