skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

[Storage] Investigate seaweedfs

Open Michaelvll opened this issue 3 years ago • 9 comments

We can check the seaweedfs to see if it has better performance and consistency guarantee than our current goofys+gcsfuse soltuion. https://github.com/seaweedfs/seaweedfs

Michaelvll avatar Sep 09 '22 05:09 Michaelvll

Would like to help! Please use the latest version ( 3.27 as of now ) to test, and report if any issue.

chrislusf avatar Sep 12 '22 22:09 chrislusf

Thanks @chrislusf!

concretevitamin avatar Sep 12 '22 23:09 concretevitamin

I was trying out seaweedfs and s3fs today:

SeaweedFS

Seaweed's cloud drive mounting seems to be what we're looking for. However, I couldn't make much progress due to a lack of clear documentation on how to use seaweed for mounting existing s3 buckets.

The instructions on this page require an interactive terminal (weed shell). I started a master with weed master, and then tried connecting by running weed shell. However, the shell is unresponsive and the logging isn't helpful in figuring out what's going on:

ubuntu@ip-172-31-35-172:~$ ./weed -v 4 shell
I1030 03:32:10.476017 config.go:46 Reading : Config File "security" Not Found in "[/home/ubuntu /home/ubuntu/.seaweedfs /usr/local/etc/seaweedfs /etc/seaweedfs]"
I1030 03:32:10.476312 config.go:46 Reading : Config File "shell" Not Found in "[/home/ubuntu /home/ubuntu/.seaweedfs /usr/local/etc/seaweedfs /etc/seaweedfs]"
I1030 03:32:10.476450 masterclient.go:126 .adminShell masterClient bootstraps with masters map[localhost:9333:localhost:9333]
I1030 03:32:10.476466 masterclient.go:171 .adminShell masterClient Connecting to master localhost:9333
I1030 03:32:10.477478 masterclient.go:196 .adminShell masterClient Connected to localhost:9333

# Gets stuck here, no input/output

This is on AWS deep learning AMI (Ubuntu 20.04), using seaweed 3.32.

Any pointers @chrislusf? Is there somewhere I can find instructions on how to use seaweedfs to mount existing s3 stores?

s3fs

s3fs has seen active development in the past year, and this time I was able to get it to run easily. Simply:

sudo apt install s3fs
mkdir ~/mymount
s3fs mybucket ~/mymount

I experimented with it a bit:

  • S3fs has much better POSIX support than goofys (e.g. sed -i works!). update: flush() does not actually flush contents to s3. File is written to s3 only upon calling close().
  • I also verified that -o use_cache='/tmp' allows caching of remote objects, so frequently accessed files should be read faster than goofys (without catfs).

I'll benchmark s3fs vs goofys performance and report back.

romilbhardwaj avatar Oct 30 '22 03:10 romilbhardwaj

@romilbhardwaj you may want to run weed server -s3 to start the master, volume server, filer, and s3 APIs at the same time. And then you can run weed mount -dir=/to/be/mounted to read and write the files. To sync with existing S3 buckets, follow https://github.com/seaweedfs/seaweedfs/wiki/Mount-Remote-Storage

chrislusf avatar Oct 30 '22 04:10 chrislusf

Thanks @chrislusf! The suggestion worked. I'm able to get seaweedfs running with s3 mounted. It also seems quite fast! It was a little confusing to setup and took some trial and error to figure out, but it works now.

A few questions:

  1. Does seaweedfs provide caching for reads from S3? If so, what's the caching policy and can we bound the cache to a size we choose?
  2. I notice a separate process needs to be run to update local writes back to S3 (weed filer.remote.sync -filer=<filerHost>:<filerPort> -dir=xxx). When this process runs in the background, is there some way for another process to know if all writes have been successful? E.g., After a task completes, we want SkyPilot to wait for filer.remote.sync to complete before shutting down the VM, else writes may be lost.

Writing down setup steps here for my own reference in the future:

# Download seaweedfs
wget https://github.com/seaweedfs/seaweedfs/releases/download/3.39/linux_amd64.tar.gz
tar -xvf linux_amd64.tar.gz

# Setup mount point
mkdir ~/swfs

# In a new terminal/screen run start weed server. It will throw connection failed errors, but wait for a bit and it should work
./weed server -s3

# In a new terminal/screen, run weed shell and mount the s3 bucket to the filer:
./weed shell
> remote.configure -name=aws -type=s3 -s3.access_key=KEY -s3.secret_key=KEY -s3.region=<region>
> remote.mount -dir=/romil-dataset -remote=aws/romil-dataset

# In a new terminal/screen, mount the filer to your local fs over FUSE:
 ./weed mount -dir=~/swfs

# Your files should show up:
ls ~/swfs/romil-dataset/

romilbhardwaj avatar Oct 30 '22 06:10 romilbhardwaj

For your case, the seaweedfs cluster is the cache. The data is asynchronously replicated back to S3 via the weed filer.remote.sync process. You can also "uncache" some data via a "weed shell" command periodically to purge some data by time or by size.

The weed filer.remote.sync is designed to work continuously. So there is no clearly way to tell the data has finished replicating, although the replication should be fairly fast.

chrislusf avatar Oct 30 '22 07:10 chrislusf

This is good to know, thanks!

For your case, the seaweedfs cluster is the cache. The data is asynchronously replicated back to S3 via the weed filer.remote.sync process. You can also "uncache" some data via a "weed shell" command periodically to purge some data by time or by size.

I see, writes are cached by the seaweedfs cluster.

I was curious if reads are also cached. E.g., when I read a file, I assume it is fetched from S3. If I open the file again, will it be fetched from S3 again or does seaweedfs cache it? If it caches, is there some way to limit the size of the cache?

Appreciate your prompt help!

romilbhardwaj avatar Oct 30 '22 07:10 romilbhardwaj

Reading from remote S3 is also cached. Need to "uncache" via commands the same way.

If size is a concern, you may want to start a volume server, by running weed volume -dir=... -master=<ip>:<port> on a different server.

chrislusf avatar Oct 30 '22 07:10 chrislusf

Awesome, thanks for all the answers @chrislusf! I'll try it out and see if it fits our use case.

romilbhardwaj avatar Oct 30 '22 07:10 romilbhardwaj

Hi @chrislusf, I've been trying to get seaweedfs to work for us. Just to clarify, we want to mount S3 buckets as FUSE mounts to access objects as if they were regular files.

Here's how I have installed seaweedfs:

# Download seaweedfs
wget https://github.com/seaweedfs/seaweedfs/releases/download/3.39/linux_amd64.tar.gz
tar -xvf linux_amd64.tar.gz

# Setup mount point
mkdir ~/swfs

# In a new terminal/screen run start weed server. It will throw connection failed errors, but wait for a bit and it should work
./weed server -s3

# In a new terminal/screen, run weed shell and mount the s3 bucket (s3://romil-test) to the filer:
./weed shell
> remote.configure -name=aws -type=s3 -s3.access_key=<key> -s3.secret_key=<key> -s3.region=<region>
> remote.mount -dir=/romil-test -remote=aws/romil-test

# In a new terminal/screen, mount the filer to your local fs over FUSE:
 ./weed mount -dir=~/swfs

# Your files should show up:
ls ~/swfs/romil-test/

At this point, I am able to see and access my files in the bucket. However, I faced a few problems:

Three issues I faced

  1. If I externally upload new files to the bucket (e.g., with aws s3 cp test.txt s3://romil-test/test.txt), they do not show up at my mount point (i.e., ls ~/swfs/romil-test does not show test.txt). Is this supported by seaweedfs?
  2. Similarly, writes made to the mount point (e.g., touch ~/swfs/romil-test/hello.txt) are not replicated to the bucket. I tried running weed filer.remote.sync but get this error for both, -dir ~/swfs and -dir ~/swfs/romil-test
./weed filer.remote.sync -dir ~/swfs
synchronize /home/ubuntu/swfs to remote storage...
E0111 18:48:53.752186 filer_remote_sync.go:97 synchronize /home/ubuntu/swfs: read mount info: /home/ubuntu/swfs is not mounted
  1. Mounting buckets with a large number of small files takes a long time. E.g., if you try to mount s3://fah-public-data-covid19-cryptic-pockets (a public bucket with many small files), remote.mount never finishes - it's been running for 1hr+ and its still listing files.:
# In weed shell
remote.mount -dir=/test -remote=aws/fah-public-data-covid19-cryptic-pockets
/covid/HCoV-NL63/spike/PROJ14236/RUN316/CLONE1/results177/frame177.xtc (create)
/covid/HCoV-NL63/spike/PROJ14236/RUN316/CLONE1/results178/frame178.xtc (create)
/covid/HCoV-NL63/spike/PROJ14236/RUN316/CLONE1/results179/frame179.xtc (create)
/covid/HCoV-NL63/spike/PROJ14236/RUN316/CLONE1/results18/frame18.xtc (create)
/covid/HCoV-NL63/spike/PROJ14236/RUN316/CLONE1/results180/frame180.xtc (create)
/covid/HCoV-NL63/spike/PROJ14236/RUN316/CLONE1/results181/frame181.xtc (create)
... <Running for 1+ hr>

Thanks again for your support!

romilbhardwaj avatar Jan 11 '23 19:01 romilbhardwaj

  1. need to use remote.cache or remote.meta.sync in weed shell to pull remote updates.
  2. ./weed filer.remote.sync -dir=/romil-test
  3. Mounting will pull and cache the metadata over. You may want to mount to specific path in the bucket, to reduce the cache scope.

chrislusf avatar Jan 11 '23 21:01 chrislusf

@romilbhardwaj : Were you able to take a look at s3fs as well?

Or for the Amazon case, it seems like Amazon EFS would be a welcome alternative to mounting S3: https://docs.ray.io/en/releases-1.11.0/cluster/aws-tips.html

https://aws.amazon.com/blogs/machine-learning/mount-an-efs-file-system-to-an-amazon-sagemaker-notebook-with-lifecycle-configurations/

(I realize I could use EFS manually as well. :) )

Taytay avatar Apr 27 '23 06:04 Taytay

@Taytay - yes, I had a look at s3fs too. Here's the branch using s3fs instead of goofys. I recall seeing much worse performance (noticeably slower than goofys in ls and other stat operations) and thus didn't pursue it.

EFS is another option we considered, but we decided not to use it because:

  1. Using EFS requires setting up complex VPC peering when the EC2 instance and the EFS mount target are in different regions, which happens frequently in SkyPilot
  2. For many users, object stores are easier to reason about and they have nice CLI tools for users to interact with them outside of SkyPilot (e.g., upload and download files externally without needing to "mount" them)

If it weren't for these blockers, we would have liked to use block stores like EFS instead of object stores since they offer better performance when accessed through POSIX layers.

That said, if you're interested, please do try setting up EFS manually with SkyPilot. We would love to hear your experiences and feedback!

romilbhardwaj avatar Apr 27 '23 07:04 romilbhardwaj

Oh, that makes a ton of sense! Especially the cross-region stuff! I'll give this some thought. Thank you!

Taytay avatar Apr 27 '23 09:04 Taytay

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 26 '23 01:08 github-actions[bot]

This issue was closed because it has been stalled for 10 days with no activity.

github-actions[bot] avatar Sep 05 '23 01:09 github-actions[bot]