auto-sklearn S3 support for auto-sklearn to store and load models and configurations for each run

S3 support for auto-sklearn to store and load models and configurations for each run

Open pkvprakash opened this issue 5 years ago • 2 comments

Currently I see no support for auto-sklearn to read and write from s3. Providing support for s3 opens up a door in running auto-sklearn pipelines in distributed mode in cloud or in any other on-prem cluster

As of now, after a quick code walkthrough, I can see there are many places auto-sklearn interact with filesystem directly using shutil, os, and lockfile modules.

This means we need to tackle this issue in two steps.

Create an abstraction layer for filesystem access and refactor the code to use this layer for all filesystem related activities.
Add support for s3 by providing concrete implementation of the abstractions for s3

What all are your thoughts?

Oct 27 '20 05:10 pkvprakash

Currently I see no support for auto-sklearn to read and write from s3

That's correct. Auto-sklearn is 100% filesystem based.

Providing support for s3 opens up a door in running auto-sklearn pipelines in distributed mode in cloud or in any other on-prem cluster

Auto-sklearn can run fully distributed in an on-prem setting if all nodes have a shared file system (which is the case in most academic settings). I assume this is different for cloud services? If yes, is this different for all cloud services?

Create an abstraction layer for filesystem access and refactor the code to use this layer for all filesystem related activities.

Yes and no. There is an abstraction layer but it's not complete yet.

Assuming that this is only to allow Auto-sklearn to be used if there is no shared file system, would there be any other advantages?

Nov 10 '20 08:11 mfeurer

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.

May 05 '21 01:05 github-actions[bot]

auto-sklearn auto-sklearn copied to clipboard

S3 support for auto-sklearn to store and load models and configurations for each run

auto-sklearn
auto-sklearn copied to clipboard