auto-sklearn
auto-sklearn copied to clipboard
S3 support for auto-sklearn to store and load models and configurations for each run
Currently I see no support for auto-sklearn to read and write from s3. Providing support for s3 opens up a door in running auto-sklearn pipelines in distributed mode in cloud or in any other on-prem cluster
As of now, after a quick code walkthrough, I can see there are many places auto-sklearn interact with filesystem directly using shutil, os, and lockfile modules.
This means we need to tackle this issue in two steps.
- Create an abstraction layer for filesystem access and refactor the code to use this layer for all filesystem related activities.
- Add support for s3 by providing concrete implementation of the abstractions for s3
What all are your thoughts?
Currently I see no support for auto-sklearn to read and write from s3
That's correct. Auto-sklearn is 100% filesystem based.
Providing support for s3 opens up a door in running auto-sklearn pipelines in distributed mode in cloud or in any other on-prem cluster
Auto-sklearn can run fully distributed in an on-prem setting if all nodes have a shared file system (which is the case in most academic settings). I assume this is different for cloud services? If yes, is this different for all cloud services?
Create an abstraction layer for filesystem access and refactor the code to use this layer for all filesystem related activities.
Yes and no. There is an abstraction layer but it's not complete yet.
Assuming that this is only to allow Auto-sklearn to be used if there is no shared file system, would there be any other advantages?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.