sagemaker-python-sdk Please add support for `requirements.txt` in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK

Please add support for requirements.txt in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK where I can specify source_dir

Jan 19 '20 00:01 cfregly

Hi Chris, thanks for your suggestion. I've added it to our backlog.

As a workaround, you can provide a shell script containing pip install commands. (You'll want to call your python script at the end of this shell script.)

Jan 20 '20 18:01 ajaykarpur

@ajaykarpur The problem is that the ScriptProcessor only takes a single file as argument not a source_dir, so you cannot include a directory with your python source file, so the workaround does not really work around the problem.

Jul 21 '20 20:07 sam-cohan

As a workaround, we ended up using the SklearnProcessor which actually takes a python script. The python script gets access to a packaged version of our code which gets downloaded using the ProcessingInput mechanism and installs it and runs the entrypoint. It works, but it was too much effort for something that should be builtin IMHO.

Oct 21 '20 21:10 sam-cohan

Hi @ajaykarpur completely agree with the prior comments about the importance and usefulness of allowing processing to use a requirements file.

Thank you!

Oct 23 '20 06:10 josiahdavis

I think it is important feature that SKLearnProcessor takes multiple python files.

Jan 26 '21 14:01 oberserk

Hi, I want to share an experimental / stop-gap work called FrameworkProcessor, to simplify submitting a Python processing job with requirements.txt, source_dir, dependencies, and git_config, using SageMaker framework training containers (i.e., tf, pytorch, mxnet, xgboost, and sklearn).

It aims to give you familiar workflow of (1) instantiate a processor, then immediately (2) call the run(...) method.

Here's an example how to use this FrameworkProcessor class (right now as Python script as opposed to .ipynb). Then, run that Python example using this shell script, but you must first change the S3 prefix and execution role, then optionally choose your prefered container.

It slightly changes the processing API by adding a SageMaker Framework estimator, which was done for two purposes: (1) auto-detect container uri, and (2) re-use the packaging mechanism in the estimator to upload to s3://.../sourcedir.tar.gz.

So far it works for my cases, but more testings or bug reports are welcome.

HTH.

Feb 16 '21 02:02 verdimrc

any news on this?

May 21 '21 15:05 jonathanglima

Is there an update on this?

Dec 07 '21 09:12 iCHAIT

Right now I am just using the processors inheriting from FrameworkProcessor (PyTorch, not SKLearn) when I need to use extra files.

I wish I could just use docker containers from docker hub, I don't understand the need for 4 or 5 functions with similar names and features.

Jul 08 '22 03:07 MatthewCaseres

No news on this one yet? I have several customers asking me how to do it and they really don't like the workarounds

Sep 20 '22 18:09 dlaredo

@ajaykarpur

Jan 31 '23 14:01 clausagerskov

Hi Team,

I have customers asking about how to do this without workarounds. Is this doable/has this been released?

Apr 12 '23 18:04 curt-lockhart

any new regarding this 3 years later?

Jun 07 '23 12:06 Cage89

Any news on this? It's absurd that for data preprocessing, which requires much more 3rd party libraries than training, we cannot easily install additional ones, whereas the option is available for estimators. It's literally already there in estimators, why couldn't this be added to processors for well over 3 years?

Oct 11 '23 11:10 j-adamczyk

@j-adamczyk have you tried looking at 'FrameworkProcessorinstead ofScriptProcessor` - i.e. https://stackoverflow.com/a/74551264/2981639

Nov 01 '23 02:11 david-waterworth

Eager to hear an update on this!

Nov 03 '23 17:11 francisco-camargo

well then

Feb 08 '24 17:02 clausagerskov