skyplane icon indicating copy to clipboard operation
skyplane copied to clipboard

[RFC] Skyplane API Design Draft

Open abiswal2001 opened this issue 1 year ago β€’ 3 comments

Example usage of the Skyplane API

Continued from: https://codeshare.io/K8YNX8

Example usage with simple copy API

# inline API that automatically manages the session for a single copy (no solver). 
# to specify configurations such as solver type or reuse gateways for multiple transfers, use session API.
client = Skyplane(auth)
client.copy(src="s3://us-east-1/foo", dst="s3://us-east-2/bar", vms=8, recursive=True)
  => internally call with new_session(): and then clean up resources

Example usage with Session API

The Skyplane API considers creating a single Session object that encompasses the lifetime of VMs. A user will specify a set of VMs to provision.

session = Skyplane(auth).new_session(
	vms=8,
  src_region="aws:us-east-1",
  dst_region="aws:us-east-2",
	solver=skyplane.DirectSolver(),
  ...
)

Once a user initiates a Session, they will create a series of copies in that Session using s.copy. Finally, they will run s.run to provision VMs and queue the transfer.

with session.auto_terminate() as s:
  s.copy(A, B, recursive=True)  # should have similar API to the simple copy API
  job = s.run()
  job.wait_for_completion()

VM provisioning and chunk request dispatch will occur in a backround thread so they can perform other work while the transfer is running.

If the user calls with session.auto_terminate(), Skyplane will ensure all VMs are cleaned up. Otherwise, the user must call session.terminate to clean up any VMs.

Authentication with clouds

auth = skyplane.Auth.from_config("config.cfg")

auth = skyplane.Auth(
	aws=skyplane.AWSAuth(),
  gcp=skyplane.GCPAuth(project_id="skyplane-project-id"), # project_id not optional
  azure=skyplane.AzureAuth(
    subscription_id="azure-subscription-id",
    gateway_umi="UMI_ID",
  ),
)

abiswal2001 avatar Sep 20 '22 19:09 abiswal2001

One question I have is how we can scope the design of Skyplane Storage as well as broadcast (@sarahwooders). We had a design doc on broadcast

Persistent sync will be implemented over the API with a single tracker VM that holds a long-running Session object. It would continually diff the source and destination region (or monitor a _selflog path) and then call session.cp on any new objects to transfer.

parasj avatar Sep 21 '22 16:09 parasj

I think this looks good for broadcast - we can basically have dst_region=List[str] for Skyplane(auth).new_session and otherwise have an identical API.

If we want to support multi-reader/multi-writer though (e.g. for Skyplane Storage), then we might need something different. Maybe later on once we implement persistent sync, we have have a separate API to "deploy" a Skyplane bucket server that clients can connect to to write. But this is probably something we'll just implement later on top of this session API.

sarahwooders avatar Sep 21 '22 22:09 sarahwooders

Based on a call with @Michaelvll, we proposed the following API: https://gist.github.com/parasj/983b9764bab66ffa60bbb171c7ea495b


# simple interface
client = Skyplane(auth)
stats = client.copy(src="s3://us-east-1/foo", dst="s3://us-east-2/bar", vms=8, recursive=True, dry_run=True)
print(stats.estimated_cost())
stats = client.copy(src="s3://us-east-1/foo", dst="s3://us-east-2/bar", vms=8, recursive=True)

# session interface
session = Skyplane(auth).new_session(
  vms=8,
  src_region="aws:us-east-1",
  dst_region="aws:us-east-2",
  solver=skyplane.DirectSolver())

with session.auto_terminate() as s:
  job.add(s.copy(A, B, recursive=True))
  job.add(s.copy(C, D))
  print(job.estimate_cost())
  future = job.run()  # provision VMs here
  await future

His questions and feedback about the API:

  • Will copy support an asynchronous call?
  • It’s surprising that the VMs are provisioned at the creation of the session. User expects only upon calling run will there be a charge to the account
  • Potential inspiration for API:
    • Thread API
    • Python SQLite API (queue many transactions and then call run call to execute the series of commands)
    • Kubernetes API where you create a cluster and submit jobs to it
  • Skypilot wants to estimate the cost of transferring data between two regions, can we surface the estimated cost from the CLI?

Some general feedback on Skyplane from the meeting:

  • Skyplane should assume there are more resources available or ask for the max number of VMs during init since the results aren't as impressive with one VM
  • @Michaelvll would like to customize the location of VMs e.g. provision a VM at the destination and not the source
  • If a vCPU capacity error occurs, can we fallback to a cloud API?

parasj avatar Sep 22 '22 23:09 parasj