llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

s3 data remote

Open germanjke opened this issue 2 years ago • 2 comments

In your training config, we can choose data_local or data_remote If I'm using data_remote on s3, what option I will get? Training loop directly from remote s3? Or first transfer s3 data to local machine and after this training on this local data?

germanjke avatar Jun 14 '23 11:06 germanjke

I'm facing same any one got answer for this?

ashoksmavd avatar Jun 16 '23 11:06 ashoksmavd

if you provide a local and a remote the shards will download from s3 to the location of the local directory. If you do further runs it will check if the shards exist at the local directory. It should not need to download anything. One complication is if a run crashes. There could be some bad clean up of shared memory files between the datasets. Assuming no runs crash and they finish successfully there should be no issue though.

codestar12 avatar Jun 22 '23 03:06 codestar12

Just to add to/summarize what @codestar12 wrote...

If you only supply data_local the code will look for the MDS shards in that local directory.

If you supply both data_local and data_remote, the code will download the MDS shards at the remote location into the local directory. Training will start while the download is happening, so you won't need to wait for the whole dataset to finish.

alextrott16 avatar Jun 27 '23 23:06 alextrott16