s3-sync-client icon indicating copy to clipboard operation
s3-sync-client copied to clipboard

Dynamically create Relocations Array

Open JakeLewisMedler opened this issue 2 years ago • 8 comments

Feature request: To be able to relocate objects on sync from bucket > local, based on an external DB

Either of the following would be useful:

  • The ability to specify a "transform" method that gets called for each file being synced.
    • In my case, I would await a call to the DB to fetch the correct location for this file

OR

  • The sync tool to return an array of object keys before the sync begins, to fetch the Relocations array for only these objects from the DB

JakeLewisMedler avatar Aug 07 '22 10:08 JakeLewisMedler

I think option 2 is already feasible that way:

  1. Run a sync with the dryRun option and get the keys from the returned object (see https://github.com/jeanbmar/s3-sync-client#sync-the-local-file-system-with-a-remote-s3-bucket)
  2. Build your relocation array from the keys
  3. Run the actual sync

What do you think?

jeanbmar avatar Aug 07 '22 10:08 jeanbmar

There is of course the possibility that in the time between running the dryRun and the actual sync, that the S3 got updated. Maybe another file got added, that will come down in the actual sync, but wasn't accounted for when I got the relocation array from the DB.

Would it be possible to pass the array of object keys discovered in the dryRun as filters for the actual sync to guarantee only these files are synced?

JakeLewisMedler avatar Aug 07 '22 10:08 JakeLewisMedler

Yes, you could do this using the filters option: https://github.com/jeanbmar/s3-sync-client#sync-the-local-file-system-with-a-remote-s3-bucket. Should look like:

filters: [
        { exclude: () => true },
        { include: (key) => keys.has(key) }, // keys is your key Set built from the dry run
    ],

jeanbmar avatar Aug 07 '22 10:08 jeanbmar

I shall give it a go and report back! Cheers!

JakeLewisMedler avatar Aug 07 '22 10:08 JakeLewisMedler

On my side I keep in mind the idea of dynamic + async relocation. It's not the first time I read something about this.

jeanbmar avatar Aug 07 '22 10:08 jeanbmar

I'm testing now - the relocation doesn't appear to be working currently. The sync tool is downloading the file, but appears to ignore the relocation array. Any ideas? Does the relocation array allow for renaming the file name in the local storage?

  • The S3 bucket has 1 object with the key 62ef977fe27094849982402d
  • I have a local directory /storage which all local files should be stored within
  • I would like the object 62ef977fe27094849982402d to become /storage/show-1/session-1/jake-medler-file-1.png

My DB API is returning the following object

[
  [
    '62ef977fe27094849982402d',
    'show-1/session-1/jake-medler-file-1.png'
  ]
]

And the sync code

require("dotenv").config();
const axios = require("axios");
const S3SyncClient = require("s3-sync-client");
const { s3Client } = require("./s3Client.js");

const { sync } = new S3SyncClient({ client: s3Client });

const runSync = async () => {
  try {
    const syncOps = await sync(
      `s3://${process.env.SPACES_BUCKET}`,
      process.env.LOCAL_PATH,
      {
        dryRun: true,
        filters: [{ exclude: (key) => key.startsWith("platform/") }],
      }
    );
    let keys = syncOps.downloads.map((op) => op.key);
    let { data: relocations } = await axios.post(
      `${process.env.API_URL}/files/sync`,
      {
        keys,
      }
    );
    console.log(relocations);
    // [
    //     [
    //       '62ef977fe27094849982402d',
    //       'show-1/session-1/jake-medler-file-1.png'
    //     ]
    // ]
    await sync(`s3://${process.env.SPACES_BUCKET}`, process.env.LOCAL_PATH, {
      relocations,
      filters: [
        { exclude: () => true },
        { include: (key) => keys.includes(key) },
      ],
      delete: true,
    });
  } catch (err) {
    console.error(err);
  }
};

runSync();

JakeLewisMedler avatar Aug 07 '22 11:08 JakeLewisMedler

I found inside sync-object.js, L.19: if (this.id.startsWith(`${sourcePrefix}/`)) requires the source to be a directory.

Changing for if (this.id.startsWith(sourcePrefix) now it transforms the key 62ef977fe27094849982402d to /storage/show-1/session-1/jake-medler-file-1.png

I could create a pull request for this change - the problem I have now, is the library doesn't take this relocation into consideration on the next sync. I.e. it thinks I don't have the file locally, so will download every time. Any solution for this?

JakeLewisMedler avatar Aug 07 '22 11:08 JakeLewisMedler

Ha, I see the issue.

The relocation option is intended in its current implementation for moving files between directories, but doesn't allow renaming the files themselves.

Removing the / character as you propose will cause side effects because a relocation will be performed:

  • if a file is named 62ef977fe27094849982402d
  • if a file is named 62ef977fe27094849982402d-xyz
  • on all the files from a directory named 62ef977fe27094849982402d
  • on all the files from a directory named 62ef977fe27094849982402d-xyz

The clean way to do this is probably to change the client code in order to allow callbacks inside the relocations array. User becomes then responsible for the transformations that happen. Which is basically your first proposition :).

I couldn't reproduce the sync issue you mentioned after making the same code change as you did:

test('sync a file with relocated filename', async () => {
  const results = await syncClient.localWithBucket(`${BUCKET_2}`, SYNC_DIR, { relocations: [['xojg', 'relocated/file.txt']] });
  console.log(results); // empty if I run the test a second time
  const objects = await syncClient.listLocalObjects(path.join(SYNC_DIR, 'relocated'));
  expect(hasObject(objects, 'file.txt')).toBe(true);
});

Also, in your second sync call, the option is named del but you used delete.

jeanbmar avatar Aug 07 '22 12:08 jeanbmar

Self note about dynamic relocations allowing full path control:


export type Relocation = (currentPath: string) => string;

relocations: [
  (currentPath) => currentPath.startsWith('xojg') ?  'relocated/file.txt' : currentPath,
]

This implementation could fully replace the existing

export type Relocation = [sourcePrefix: string, targetPrefix: string];

jeanbmar avatar May 22 '23 10:05 jeanbmar