conda-mirror
conda-mirror copied to clipboard
sync only new packages
All,
from a quick look to the code looks like conda-mirror copies all the repository (aka channel) files every time is launched. Is this correct?
I would be useful to download only missing/new packages in order to save bandwidth.
Thanks GP
Is this correct?
No this is not correct. conda-mirror computes the packages that it is missing from the upstream channel based on the defined package whitelist/blacklist configuration and then downloads only those.
I would be useful to download only missing/new packages in order to save bandwidth.
This is indeed what is happening
OK,
this is a good news! I am having some issues with the download,
During my download I am experiencing some network instability, so the conda-mirror process crashes. What i noted is that the packages download so far in the tmp dir are deleted, and in the new run my download have to restart from the beginning. I would be very nice to save the work already done. Is this currently possible?
To make the download more resilient i modified the code in _download() as following:
(NOTE: code not tested i am testing right now!)
def _download(url, target_directory):
pause_seconds = [3, 15, 30, 60, 120, 300, 900]
for secs in pause_seconds:
try:
chunk_size = 1024 # 1KB chunks
logger.info("download_url=%s", url)
# create a temporary file
target_filename = url.split('/')[-1]
download_filename = os.path.join(target_directory, target_filename)
logger.debug('downloading to %s', download_filename)
with open(download_filename, 'w+b') as tf:
ret = requests.get(url, stream=True)
for data in ret.iter_content(chunk_size):
tf.write(data)
logger.info('File {} succesfully downloaded'.format(download_filename))
break
except Exception as ex:
logger.exception("Failure in network connection")
logger.info("Retry in {} seconds".format(secs))
time.sleep(secs)
logger.info("Try again to download")
If the community is interested i can improve this code (e.g. get the pause_seconds as command line parameters, better exception catch) and submit a PR.
Thanks GP
What is the stack trace that you're seeing from conda-mirror when it crashes?
Generally speaking, PRs are welcome :)
Eric,
unfortunately i lost the stack trace with the error from my shell :-( I will run conda-mirror overnight and if i will experience the error again i will surely report to you.
BTW, Is my assumption on the tmp dir files not being copied on the destination dir in case of system crash correct?
For the PR i am definitively happy to contribute, but i want to test it a bit more.
Thanks for your help
GP
BTW, Is my assumption on the tmp dir files not being copied on the destination dir in case of system crash correct?
Correct. They are not automatically being copied to the destination dir in case of a crash.
I am working through similar issues as gpcimino. Only I can not complete a first run. when running: conda-mirror --upstream-channel conda-forge --target-directory local_mirror --platform linux-64 -vvv
I can see packages being downloaded to /tmp but ultimately the process blows up with an error stating: Remote end closed connection without response
Full stack trace is attached stacktrace.txt
Not sure if Anacoanda.org is misbehaving. Any ideas?
101glover, change #71 altered the code so that the packages already downloaded will still be processed if the download fails.
hey magnuhho, thanks for the notification. That solved my problems!