sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Something like a .sagemakerignore file option to allow for skipping of certain directories when creating source tar

Open njbrake opened this issue 2 years ago • 1 comments

Describe the feature you'd like Similar to how in Docker there is a .dockerignore file which works like a .gitignore file, It would be great if there was a .sagemakerignore file so that any directories listed in that file would be skipped when making the tar file. I believe it would be a simple change here: https://github.com/aws/sagemaker-python-sdk/blob/744724b01bcdc3f451c9b73c34c947e12d6a8e2a/src/sagemaker/fw_utils.py#L486 Basically adding a few lines that check for the existence of a .sagemakerignore file in the directory of the directory variable, if it exists, and then skipping anything listed out in that file.

How would this feature be used? Please describe. This would be useful to reduce the size of the source tar file which would allow a person to speed up sagemaker build creation without the need to restructure their code, if they have big files in the source directory that they don't commit to other things like git and docker, so they don't want it put into the sagemaker tarball

Describe alternatives you've considered An alternative is not doing it

Additional context I don't mind writing the code to do this but wanted to check in that this was something that would be accepted if I make a PR for it?

njbrake avatar Oct 12 '23 17:10 njbrake

+1 for this feature

l3ku avatar Oct 18 '23 13:10 l3ku