sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Please add support for `requirements.txt` in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK
Please add support for requirements.txt in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK where I can specify source_dir
Hi Chris, thanks for your suggestion. I've added it to our backlog.
As a workaround, you can provide a shell script containing pip install commands. (You'll want to call your python script at the end of this shell script.)
@ajaykarpur The problem is that the ScriptProcessor only takes a single file as argument not a source_dir, so you cannot include a directory with your python source file, so the workaround does not really work around the problem.
As a workaround, we ended up using the SklearnProcessor which actually takes a python script. The python script gets access to a packaged version of our code which gets downloaded using the ProcessingInput mechanism and installs it and runs the entrypoint. It works, but it was too much effort for something that should be builtin IMHO.
Hi @ajaykarpur completely agree with the prior comments about the importance and usefulness of allowing processing to use a requirements file.
Thank you!
I think it is important feature that SKLearnProcessor takes multiple python files.
Hi, I want to share an experimental / stop-gap work called FrameworkProcessor, to simplify submitting a Python processing job with requirements.txt, source_dir, dependencies, and git_config, using SageMaker framework training containers (i.e., tf, pytorch, mxnet, xgboost, and sklearn).
It aims to give you familiar workflow of (1) instantiate a processor, then immediately (2) call the run(...) method.
Here's an example how to use this FrameworkProcessor class (right now as Python script as opposed to .ipynb). Then, run that Python example using this shell script, but you must first change the S3 prefix and execution role, then optionally choose your prefered container.
It slightly changes the processing API by adding a SageMaker Framework estimator, which was done for two purposes: (1) auto-detect container uri, and (2) re-use the packaging mechanism in the estimator to upload to s3://.../sourcedir.tar.gz.
So far it works for my cases, but more testings or bug reports are welcome.
HTH.
any news on this?
Is there an update on this?
Right now I am just using the processors inheriting from FrameworkProcessor (PyTorch, not SKLearn) when I need to use extra files.
I wish I could just use docker containers from docker hub, I don't understand the need for 4 or 5 functions with similar names and features.
No news on this one yet? I have several customers asking me how to do it and they really don't like the workarounds
@ajaykarpur
Hi Team,
I have customers asking about how to do this without workarounds. Is this doable/has this been released?
any new regarding this 3 years later?
Any news on this? It's absurd that for data preprocessing, which requires much more 3rd party libraries than training, we cannot easily install additional ones, whereas the option is available for estimators. It's literally already there in estimators, why couldn't this be added to processors for well over 3 years?
@j-adamczyk have you tried looking at 'FrameworkProcessorinstead ofScriptProcessor` - i.e. https://stackoverflow.com/a/74551264/2981639
Eager to hear an update on this!
well then