nexus3-cli icon indicating copy to clipboard operation
nexus3-cli copied to clipboard

Allow full downloads on large repos

Open pbsladek opened this issue 5 years ago • 7 comments

Thx for writing this.. has eased a lot of stuff with repo transfers between instances.

Added this around line 176 of nexus_client.py

try:
     content = response.json()
   except json.decoder.JSONDecodeError:
     raise exception.NexusClientAPIError(response.content)

and got the following:

nexuscli.exception.NexusClientAPIError: b'ERROR: (ID 40b7714e-41ac-416a-8cbc-7fe8b7a0b639) 
Failed to execute phase [query], all shards failed;
shardFailures {[pUq2BG92Raedvpiju7ztMg][215bfafbc6963d8c9f7ec9a57f88c6223b00dfa6][0]:
RemoteTransportException[[8A042D9D-8A28015E-AF29C30B-EC2AE858-7B66110B][local[1]][indices:data/read/search[phase/query]]]; nested: 
QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. 
See the scroll api for a more efficient way to request large data sets.
This limit can be set by changing the [index.max_result_window]
index level parameter.]; }

I think this is an elastic search issue with nexus etc and I doubt they will fix it anytime soon.

Wrote a quick wrapper to use the cli to download every sub directory. Would be cool if you could implement a way to do this within the cli.

e.g. grab the directory structure and download each sub directory 1 by 1.

pbsladek avatar Dec 05 '19 02:12 pbsladek

Thanks for the bug report @pbsladek. I'm looking to reproduce this locally - how big is too big? I'm assuming 10,001 files.

I have a feeling that the directory iteration strategy might still fail for directories with a number of files above 10k.

bt-thiago avatar Dec 06 '19 16:12 bt-thiago

Nexus issue: https://issues.sonatype.org/browse/NEXUS-16917

bt-thiago avatar Dec 06 '19 17:12 bt-thiago

To reproduce:

nexus3 repository create raw raw
from nexuscli import nexus_client, nexus_config
config = nexus_config.NexusConfig()
config.load()
c = nexus_client.NexusClient(config)
for i in range(10001):
    c.upload('/dev/null', f'raw/a{i}')
nexus3 dl raw/ .

bt-thiago avatar Dec 06 '19 18:12 bt-thiago

@pbsladek the error can happen on a single directory with more than 10k files, so the strategy of breaking-up downloads per directory won't always work, although it would probably cover most cases.

I might implement your suggestion but I'd like to think about this for a bit.

bt-thiago avatar Dec 06 '19 18:12 bt-thiago

Hey, no problem. Thanks for taking a look.

I ran into the same issue you mentioned on our repos. Generated a list of file names and downloaded them individually. Prob wouldn't make sense for the cli to manage that though.

pbsladek avatar Dec 06 '19 20:12 pbsladek

Hi, this issue can also be related to migrate repositories from a server A to a server B? Like doing a full download of all your components for a migration? thanks

forgondolin avatar Apr 17 '20 15:04 forgondolin

Oi, @forgondolin. Yes, you would probably see this in an operation like you describe. I just looked at the (upstream issue)[https://issues.sonatype.org/browse/NEXUS-16917] and Sonatype won't be fixing this any time soon. It might help if everyone who sees this issues goes there and upvotes the bug.

Meanwhile, use the workaround that @pbsladek suggested: break-up your downloads into chunks of up to 10,000 files.

I'm happy to review PR contributions for a work-around but I'm also unlikely to do it myself.

thiagofigueiro avatar Apr 17 '20 21:04 thiagofigueiro