atomic-server
atomic-server copied to clipboard
s3 uploads #543
PR Checklist:
- [x] Link to related issues: #543
- [x] Add changelog entry linking to issue, describe API changes
- [-] Add tests (optional)
- [x] (If new feature) add to description / readme
Open issues:
- The data browser preview is currently not working for s3 files. This is likely because the download endpoint does a 302 temporary redirect to a S3 signed URL. I'd be happy to open a new issue for this for someone to fix that is familiar with the client side code.
- Currently the config only requires S3 bucket to be specified. This is in line with opendal's requirements. The main callout here is that we are depending on opendal's default for region, which is "us-east-1". I could specify this directly in atomic's source code or we can make this a required config value.
Awesome, great job! I'll take a closer look tomorrow :)
@joepio This should be ready for another look.
Looking good! One more general point maybe: Could you add some documentation and tests? I'd accept a PR without them if you had enough of this.
Especially one happy flow for FileStore in tests would be nice.
Just tests and documentation left. Should be able to finish this up on Friday.
Alright @joepio this is ready for another look. Tests are passing and functionality tested with small and large files.
Great job! Only thing I still have left is authentication.
You can init OpenDal via both struct+builder or hashmap, passing region and endpoint:
fn init_operator_via_map() -> Result<Operator> {
// setting up the credentials
let access_key_id =
env::var("AWS_ACCESS_KEY_ID").expect("AWS_ACCESS_KEY_ID is set and a valid String");
let secret_access_key =
env::var("AWS_SECRET_ACCESS_KEY").expect("AWS_ACCESS_KEY_ID is set and a valid String");
let mut map = HashMap::default();
map.insert("bucket".to_string(), "test".to_string());
map.insert("region".to_string(), "us-east-1".to_string());
map.insert("endpoint".to_string(), "http://rpi4node3:8333".to_string());
map.insert("access_key_id".to_string(), access_key_id);
map.insert("secret_access_key".to_string(), secret_access_key);
let op = Operator::via_map(Scheme::S3, map)?;
Ok(op)
}
But honestly, I don't see anything wrong with the current auth, @joepio. What's your take on it? I believe Keypair allow to work with both private and public.
Late to the party, but personally, I would prefer to remove this parameter:
/// CAUTION: Skip authentication checks, making all data publicly readable. Improves performance.
#[clap(long, env = "ATOMIC_PUBLIC_MODE")]
pub public_mode: bool,
It's the job of the atomic server to decide if the file or any other resource is public or visible to the group. Storage is always private with a key/secret pair; if people want to share their s3 bucket as a public website, they have another way of doing it.
Late to the party, but personally, I would prefer to remove this parameter:
/// CAUTION: Skip authentication checks, making all data publicly readable. Improves performance. #[clap(long, env = "ATOMIC_PUBLIC_MODE")] pub public_mode: bool,
It's the job of the atomic server to decide if the file or any other resource is public or visible to the group. Storage is always private with a key/secret pair; if people want to share their s3 bucket as a public website, they have another way of doing it.
This flag is unrelated to this PR / S3, it's for regular authorization checks.
For some reason when I refresh a file, it gives me a 404. When I press retry, it works. I'll try to find out what's going on.
I think my 404 issue has to do with the URL encoding. The first part in s3:fileID
is URL encoded, so it becomes s3%3AfileID
(:
becomes %3A
). When we refresh the page, though, the server un-encodes it I think so we get the :
instead, which 404s.
@metame do you remember what the encoding was needed for? Sorry I only find out about this so long after you opened this PR!
I think my 404 issue has to do with the URL encoding. The first part in
s3:fileID
is URL encoded, so it becomess3%3AfileID
(:
becomes%3A
). When we refresh the page, though, the server un-encodes it I think so we get the:
instead, which 404s.@metame do you remember what the encoding was needed for? Sorry I only find out about this so long after you opened this PR!
So the main reason we implemented the file prefix was to support backwards compatibility among file systems. That is, if an atomic server instance had files uploaded to the server directly and then turned on s3 support, the existing files would still work.
I've found the source of this bug, but the fix has some drawbacks. Specifically, it requires url-encoding the subject path for every resource that goes through handler::get_resource
like here. Perhaps, this is acceptable. If not, we could drop the behavior that supports backwards compatibility and server admins would have to deal with the migration themselves.