skyplane
skyplane copied to clipboard
[RFC] Skyplane API Design Draft
Example usage of the Skyplane API
Continued from: https://codeshare.io/K8YNX8
Example usage with simple copy API
# inline API that automatically manages the session for a single copy (no solver).
# to specify configurations such as solver type or reuse gateways for multiple transfers, use session API.
client = Skyplane(auth)
client.copy(src="s3://us-east-1/foo", dst="s3://us-east-2/bar", vms=8, recursive=True)
=> internally call with new_session(): and then clean up resources
Example usage with Session API
The Skyplane API considers creating a single Session object that encompasses the lifetime of VMs. A user will specify a set of VMs to provision.
session = Skyplane(auth).new_session(
vms=8,
src_region="aws:us-east-1",
dst_region="aws:us-east-2",
solver=skyplane.DirectSolver(),
...
)
Once a user initiates a Session, they will create a series of copies in that Session using s.copy
. Finally, they will run s.run
to provision VMs and queue the transfer.
with session.auto_terminate() as s:
s.copy(A, B, recursive=True) # should have similar API to the simple copy API
job = s.run()
job.wait_for_completion()
VM provisioning and chunk request dispatch will occur in a backround thread so they can perform other work while the transfer is running.
If the user calls with session.auto_terminate()
, Skyplane will ensure all VMs are cleaned up. Otherwise, the user must call session.terminate
to clean up any VMs.
Authentication with clouds
auth = skyplane.Auth.from_config("config.cfg")
auth = skyplane.Auth(
aws=skyplane.AWSAuth(),
gcp=skyplane.GCPAuth(project_id="skyplane-project-id"), # project_id not optional
azure=skyplane.AzureAuth(
subscription_id="azure-subscription-id",
gateway_umi="UMI_ID",
),
)
One question I have is how we can scope the design of Skyplane Storage as well as broadcast (@sarahwooders). We had a design doc on broadcast
Persistent sync will be implemented over the API with a single tracker VM that holds a long-running Session object. It would continually diff the source and destination region (or monitor a _selflog
path) and then call session.cp
on any new objects to transfer.
I think this looks good for broadcast - we can basically have dst_region=List[str]
for Skyplane(auth).new_session
and otherwise have an identical API.
If we want to support multi-reader/multi-writer though (e.g. for Skyplane Storage), then we might need something different. Maybe later on once we implement persistent sync, we have have a separate API to "deploy" a Skyplane bucket server that clients can connect to to write. But this is probably something we'll just implement later on top of this session API.
Based on a call with @Michaelvll, we proposed the following API: https://gist.github.com/parasj/983b9764bab66ffa60bbb171c7ea495b
# simple interface
client = Skyplane(auth)
stats = client.copy(src="s3://us-east-1/foo", dst="s3://us-east-2/bar", vms=8, recursive=True, dry_run=True)
print(stats.estimated_cost())
stats = client.copy(src="s3://us-east-1/foo", dst="s3://us-east-2/bar", vms=8, recursive=True)
# session interface
session = Skyplane(auth).new_session(
vms=8,
src_region="aws:us-east-1",
dst_region="aws:us-east-2",
solver=skyplane.DirectSolver())
with session.auto_terminate() as s:
job.add(s.copy(A, B, recursive=True))
job.add(s.copy(C, D))
print(job.estimate_cost())
future = job.run() # provision VMs here
await future
His questions and feedback about the API:
- Will copy support an asynchronous call?
- Itβs surprising that the VMs are provisioned at the creation of the session. User expects only upon calling run will there be a charge to the account
- Potential inspiration for API:
- Thread API
- Python SQLite API (queue many transactions and then call run call to execute the series of commands)
- Kubernetes API where you create a cluster and submit jobs to it
- Skypilot wants to estimate the cost of transferring data between two regions, can we surface the estimated cost from the CLI?
Some general feedback on Skyplane from the meeting:
- Skyplane should assume there are more resources available or ask for the max number of VMs during init since the results aren't as impressive with one VM
- @Michaelvll would like to customize the location of VMs e.g. provision a VM at the destination and not the source
- If a vCPU capacity error occurs, can we fallback to a cloud API?