s3-sync-client
s3-sync-client copied to clipboard
Dynamically create Relocations Array
Feature request: To be able to relocate objects on sync from bucket > local, based on an external DB
Either of the following would be useful:
- The ability to specify a "transform" method that gets called for each file being synced.
- In my case, I would await a call to the DB to fetch the correct location for this file
OR
- The sync tool to return an array of object keys before the sync begins, to fetch the Relocations array for only these objects from the DB
I think option 2 is already feasible that way:
- Run a sync with the
dryRun
option and get the keys from the returned object (see https://github.com/jeanbmar/s3-sync-client#sync-the-local-file-system-with-a-remote-s3-bucket) - Build your relocation array from the keys
- Run the actual sync
What do you think?
There is of course the possibility that in the time between running the dryRun
and the actual sync, that the S3 got updated. Maybe another file got added, that will come down in the actual sync, but wasn't accounted for when I got the relocation array from the DB.
Would it be possible to pass the array of object keys discovered in the dryRun
as filters for the actual sync to guarantee only these files are synced?
Yes, you could do this using the filters option: https://github.com/jeanbmar/s3-sync-client#sync-the-local-file-system-with-a-remote-s3-bucket. Should look like:
filters: [
{ exclude: () => true },
{ include: (key) => keys.has(key) }, // keys is your key Set built from the dry run
],
I shall give it a go and report back! Cheers!
On my side I keep in mind the idea of dynamic + async relocation. It's not the first time I read something about this.
I'm testing now - the relocation doesn't appear to be working currently. The sync tool is downloading the file, but appears to ignore the relocation array. Any ideas? Does the relocation array allow for renaming the file name in the local storage?
- The S3 bucket has 1 object with the key
62ef977fe27094849982402d
- I have a local directory
/storage
which all local files should be stored within - I would like the object
62ef977fe27094849982402d
to become/storage/show-1/session-1/jake-medler-file-1.png
My DB API is returning the following object
[
[
'62ef977fe27094849982402d',
'show-1/session-1/jake-medler-file-1.png'
]
]
And the sync code
require("dotenv").config();
const axios = require("axios");
const S3SyncClient = require("s3-sync-client");
const { s3Client } = require("./s3Client.js");
const { sync } = new S3SyncClient({ client: s3Client });
const runSync = async () => {
try {
const syncOps = await sync(
`s3://${process.env.SPACES_BUCKET}`,
process.env.LOCAL_PATH,
{
dryRun: true,
filters: [{ exclude: (key) => key.startsWith("platform/") }],
}
);
let keys = syncOps.downloads.map((op) => op.key);
let { data: relocations } = await axios.post(
`${process.env.API_URL}/files/sync`,
{
keys,
}
);
console.log(relocations);
// [
// [
// '62ef977fe27094849982402d',
// 'show-1/session-1/jake-medler-file-1.png'
// ]
// ]
await sync(`s3://${process.env.SPACES_BUCKET}`, process.env.LOCAL_PATH, {
relocations,
filters: [
{ exclude: () => true },
{ include: (key) => keys.includes(key) },
],
delete: true,
});
} catch (err) {
console.error(err);
}
};
runSync();
I found inside sync-object.js
, L.19:
if (this.id.startsWith(`${sourcePrefix}/`))
requires the source to be a directory.
Changing for
if (this.id.startsWith(sourcePrefix)
now it transforms the key 62ef977fe27094849982402d
to /storage/show-1/session-1/jake-medler-file-1.png
I could create a pull request for this change - the problem I have now, is the library doesn't take this relocation into consideration on the next sync. I.e. it thinks I don't have the file locally, so will download every time. Any solution for this?
Ha, I see the issue.
The relocation
option is intended in its current implementation for moving files between directories, but doesn't allow renaming the files themselves.
Removing the /
character as you propose will cause side effects because a relocation will be performed:
- if a file is named 62ef977fe27094849982402d
- if a file is named 62ef977fe27094849982402d-xyz
- on all the files from a directory named 62ef977fe27094849982402d
- on all the files from a directory named 62ef977fe27094849982402d-xyz
The clean way to do this is probably to change the client code in order to allow callbacks inside the relocations array. User becomes then responsible for the transformations that happen. Which is basically your first proposition :).
I couldn't reproduce the sync issue you mentioned after making the same code change as you did:
test('sync a file with relocated filename', async () => {
const results = await syncClient.localWithBucket(`${BUCKET_2}`, SYNC_DIR, { relocations: [['xojg', 'relocated/file.txt']] });
console.log(results); // empty if I run the test a second time
const objects = await syncClient.listLocalObjects(path.join(SYNC_DIR, 'relocated'));
expect(hasObject(objects, 'file.txt')).toBe(true);
});
Also, in your second sync
call, the option is named del
but you used delete
.
Self note about dynamic relocations allowing full path control:
export type Relocation = (currentPath: string) => string;
relocations: [
(currentPath) => currentPath.startsWith('xojg') ? 'relocated/file.txt' : currentPath,
]
This implementation could fully replace the existing
export type Relocation = [sourcePrefix: string, targetPrefix: string];