nodejs-storage icon indicating copy to clipboard operation
nodejs-storage copied to clipboard

fix: allow files in directories to be downloaded onto local machine

Open hochoy opened this issue 2 years ago • 5 comments

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • [X] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • [X] Ensure the tests and linter pass
  • [X] Code coverage does not decrease (if any source code was changed)
  • [X] Appropriate docs were updated (if necessary) (n/a)

Fixes #2200 🦕

hochoy avatar May 18 '23 07:05 hochoy

@ddelgrosso1 , I tried running samples-test and system-test but based on the logs, it appears those require an active project in order to do an e2e/live test against real GCP infrastructure.

Is that accurate? How expensive is that if I run it on my own project? Should I be triggering it here in the repo instead (and leverage your repo's cloudbuild/github actions infra)?

hochoy avatar May 18 '23 20:05 hochoy

Also, I would like to add a test to check that the files are truly downloaded into the local machine or memory. Would you recommend I give that a shot? Or is that overkill?

hochoy avatar May 18 '23 20:05 hochoy

@ddelgrosso1 , I tried running samples-test and system-test but based on the logs, it appears those require an active project in order to do an e2e/live test against real GCP infrastructure.

Is that accurate? How expensive is that if I run it on my own project? Should I be triggering it here in the repo instead (and leverage your repo's cloudbuild/github actions infra)?

That is accurate. I wouldn't worry about running these on your own, they get run in the CI pipeline each time a commit is pushed. However, unit tests should work without issue locally.

Also, I would like to add a test to check that the files are truly downloaded into the local machine or memory. Would you recommend I give that a shot? Or is that overkill?

If you feel up to it a unit test can probably be created to test this. I can look to see if we have any similar tests elsewhere that might serve as a guide.

One thing I will do is to cleanup the the JS Docs to make it abundantly clear that not supplying a prefix will result in the files downloaded to memory.

ddelgrosso1 avatar May 19 '23 13:05 ddelgrosso1

@danielbankhead would you mind just giving this a second set of eyes?

ddelgrosso1 avatar Jun 08 '23 14:06 ddelgrosso1

Hey @danielbankhead @ddelgrosso1 , just wanted to provide a heads up that I'll be revisiting this in about 1-2 months due to project priorities. Thank you for all the feedback and recommendations, they are all valid points for a user like myself.

I do need something like TransferManager for a project at work, so I'll very likely be back (no promises of course). But in its current design, it would be more practical for me to build my own version of the TransferManager to reflect the interface I'm looking for. I will try to wrap the existing File.download method during my experimentation, similar to the current design, but might go a different way if needed.

Couple things I would design for, after both your feedback:

  • downloads should always be concurrent
  • empty objects (like "folders") on gcp are not writeable need special handling
  • directory creation should be optimized at the TransferManager level
  • unix vs windows-based file handling
  • read and write failures should not fail silently. In the event of a partial directory/folder download, we should make it easy for the user to identify which files succeeded vs failed.
  • if similar files exist in the write destination, we could provide a default of either over-writing or error-throwing or user-prompt

I envision users wanting to use TransferManager to download entire buckets or "sub-directories". So, those are some of the considerations I could think of.

Something else that the Storage service should consider, is whether this should be done at the API-level instead. Allowing users to use more advanced queries. I know adding any logic around bucket objects is likely terrible for performance, but it would make this cross-language compatible. Trying to implement TransferManager across all supported languages would mean a lot of SDK work.

hochoy avatar Jul 08 '23 01:07 hochoy

@ddelgrosso1 Can you review this PR? I also drafted a summary of changes here fro your ease: go/node-transfermanager-gcs

vishwarajanand avatar May 15 '24 12:05 vishwarajanand