B2_Command_Line_Tool icon indicating copy to clipboard operation
B2_Command_Line_Tool copied to clipboard

sync of a lot of small files fails

Open usbpc opened this issue 8 years ago • 15 comments

I tried to sync about 1 million small files (Combined size of ~40GB) to backblaze and got a lot of 408 request_timeout errors:

b2_upload(/local/path/to/file, b2/path/to/file, 1499162591000): UnknownError('408 request_timeout Error reading uploaded data: SocketTimeoutException(Read timed out)',) Unknown error: 408 request_timeout Error reading uploaded data: SocketTimeoutException(Read timed out)

And also some other errors:

b2_upload(/local/path/to/file, b2/path/to/file, 1499161555000): IndexError('tuple index out of range',) tuple index out of range

The command I used is: b2 sync --delete --threads 100 /path/to/local/directory b2://bucketname/foldername

So all of the old file versions were deleted, while all the files that got an error were not uploaded. I would expect that it would retry all the failed uploads.

usbpc avatar Jul 04 '17 20:07 usbpc

Did you rerun the command after the failed uploads or was this from a single run? If it was from a single run, does executing the same command again upload the missing files? The B2 CLI does retry uploading after some errors which indicate temporary problems, which should include the timeout errors. However, the IndexError seems strange and probably shoudn't happen.

svonohr avatar Jul 04 '17 20:07 svonohr

Everything in my last commant was from just a singel run.

Running it two more times got it to upload all files, on the first rerun it exited wit some strange error, that I unfortunatly only have the last part of:

  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 992, in run_command
    return command.run(args)
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 781, in run
    dry_run=args.dryRun,
  File "/usr/local/lib/python3.5/dist-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 263, in sync_folders
    raise CommandError('sync is incomplete')
b2.exception.CommandError: sync is incomplete
ERROR: sync is incomplete

The second time it ran perfectly.

usbpc avatar Jul 04 '17 21:07 usbpc

If you can get the rest of that error, it would be very helpful.

When I try with 100 threads, I exceed my ulimit and get this error:

B2ConnectionError: Connection error: [Errno 24] Too many open files

With 10 threads, things work for me. Are you set up to allow lots of file descriptors?

bwbeach avatar Jul 05 '17 18:07 bwbeach

Apperently I am set up to use a lot of file descriptors. (I'm using the windows subsystem for linux, and didn't get that error ever). Unfortunatly I don't have the rest of the error message, but that was a second diffrent error. Sorry for that :(

usbpc avatar Jul 06 '17 07:07 usbpc

I actually got the same error again, this time I got all of it:

ERROR:b2.console_tool:ConsoleTool command error
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 992, in run_command
    return command.run(args)
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 781, in run
    dry_run=args.dryRun,
  File "/usr/local/lib/python3.5/dist-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 263, in sync_folders
    raise CommandError('sync is incomplete')
b2.exception.CommandError: sync is incomplete
ERROR: sync is incomplete

I also only used 25 Threads not 100 for this sync.

I also got a few warnings:

WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: api001.backblazeb2.com

usbpc avatar Jul 06 '17 19:07 usbpc

The "sync is incomplete" is just to signal that not all files from the sync finished successfully. It would be interesting so know why some files did not finish. Do you see any more errors before this last one?

svonohr avatar Jul 06 '17 19:07 svonohr

I got this:

ERROR:b2.bucket:error when uploading, upload_url was https://pod-000-1041-10.backblaze.com/b2api/v1/b2_upload_file/<file_id>
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/b2/bucket.py", line 392, in _upload_small_file
    sha1_sum, file_info, input_stream
  File "/usr/local/lib/python3.5/dist-packages/b2/raw_api.py", line 455, in upload_file
    return self.b2_http.post_content_return_json(upload_url, headers, data_stream)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 259, in post_content_return_json
    response = _translate_and_retry(do_post, try_count, post_params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 103, in _translate_and_retry
    return _translate_errors(fcn, post_params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 52, in _translate_errors
    int(error['status']), error['code'], error['message'], post_params
b2.exception.ServiceError: 503 service_unavailable c001_v0001041_t0037 is too busy

with diffrent files and diffrent upload urls at least 17 times, but thats all the Errors I can see. I will try to get all of the errors into a file next time I run it.

Edit: Also sorry for my bad Englisch :(

usbpc avatar Jul 07 '17 09:07 usbpc

I got another error just now, I think this time my pipe broke if I understand it correctly:

ERROR:b2.console_tool:ConsoleTool unexpected exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 47, in _translate_errors
    response = fcn()
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 255, in do_post
    response = self.session.post(url, headers=headers, data=data)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 549, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 502, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 612, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 992, in run_command
    return command.run(args)
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 781, in run
    dry_run=args.dryRun,
  File "/usr/local/lib/python3.5/dist-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 251, in sync_folders
    source_folder, dest_folder, args, now_millis, reporter
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 136, in make_folder_sync_actions
    dest_file) in zip_folders(source_folder, dest_folder, reporter, exclusions, inclusions):
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 93, in zip_folders
    current_b = next_or_none(iter_b)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 36, in next_or_none
    return six.advance_iterator(iterator)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/folder.py", line 192, in all_files
    self.folder_name, show_versions=True, recursive=True, fetch_count=1000
  File "/usr/local/lib/python3.5/dist-packages/b2/bucket.py", line 207, in ls
    self.id_, start_file_name, start_file_id, fetch_count
  File "/usr/local/lib/python3.5/dist-packages/b2/session.py", line 38, in wrapper
    return f(api_url, account_auth_token, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/b2/raw_api.py", line 358, in list_file_versions
    maxFileCount=max_file_count
  File "/usr/local/lib/python3.5/dist-packages/b2/raw_api.py", line 135, in _post_json
    return self.b2_http.post_json_return_json(url, headers, params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 286, in post_json_return_json
    return self.post_content_return_json(url, headers, data, try_count, params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 259, in post_content_return_json
    response = _translate_and_retry(do_post, try_count, post_params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 103, in _translate_and_retry
    return _translate_errors(fcn, post_params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 71, in _translate_errors
    if e2.args[1] == 'Broken pipe':
IndexError: tuple index out of range
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 47, in _translate_errors
    response = fcn()
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 255, in do_post
    response = self.session.post(url, headers=headers, data=data)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 549, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 502, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 612, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/b2", line 9, in <module>
    load_entry_point('b2==0.7.2', 'console_scripts', 'b2')()
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 1104, in main
    exit_status = ct.run_command(decoded_argv)
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 992, in run_command
    return command.run(args)
  File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 781, in run
    dry_run=args.dryRun,
  File "/usr/local/lib/python3.5/dist-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 251, in sync_folders
    source_folder, dest_folder, args, now_millis, reporter
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 136, in make_folder_sync_actions
    dest_file) in zip_folders(source_folder, dest_folder, reporter, exclusions, inclusions):
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 93, in zip_folders
    current_b = next_or_none(iter_b)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/sync.py", line 36, in next_or_none
    return six.advance_iterator(iterator)
  File "/usr/local/lib/python3.5/dist-packages/b2/sync/folder.py", line 192, in all_files
    self.folder_name, show_versions=True, recursive=True, fetch_count=1000
  File "/usr/local/lib/python3.5/dist-packages/b2/bucket.py", line 207, in ls
    self.id_, start_file_name, start_file_id, fetch_count
  File "/usr/local/lib/python3.5/dist-packages/b2/session.py", line 38, in wrapper
    return f(api_url, account_auth_token, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/b2/raw_api.py", line 358, in list_file_versions
    maxFileCount=max_file_count
  File "/usr/local/lib/python3.5/dist-packages/b2/raw_api.py", line 135, in _post_json
    return self.b2_http.post_json_return_json(url, headers, params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 286, in post_json_return_json
    return self.post_content_return_json(url, headers, data, try_count, params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 259, in post_content_return_json
    response = _translate_and_retry(do_post, try_count, post_params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 103, in _translate_and_retry
    return _translate_errors(fcn, post_params)
  File "/usr/local/lib/python3.5/dist-packages/b2/b2http.py", line 71, in _translate_errors
    if e2.args[1] == 'Broken pipe':
IndexError: tuple index out of range

usbpc avatar Jul 08 '17 10:07 usbpc

Hello, I'm having issues too with uploading LOTS of small files. Right now we would like to use B2 to backup around 1TB of really small files. Given the amount of data we'll have to use a lot of threads to speed-up the upload (we're testing around 200 threads).

With this configuration, we get 2 kind of errors that I think are related to this issue.

One is "Pool is closed" and the other is that " is too busy"

Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/b2/bucket.py", line 415, in _upload_small_file content_type, HEX_DIGITS_AT_END, file_info, hashing_stream File "/usr/lib/python3.4/site-packages/b2/raw_api.py", line 497, in upload_file return self.b2_http.post_content_return_json(upload_url, headers, data_stream) File "/usr/lib/python3.4/site-packages/b2/b2http.py", line 275, in post_content_return_json response = _translate_and_retry(do_post, try_count, post_params) File "/usr/lib/python3.4/site-packages/b2/b2http.py", line 119, in _translate_and_retry return _translate_errors(fcn, post_params) File "/usr/lib/python3.4/site-packages/b2/b2http.py", line 79, in _translate_errors raise B2ConnectionError(str(e0)) b2.exception.B2ConnectionError: Connection error: HTTPSConnectionPool(host='xxxx.backblaze.com', port=443): Pool is closed.

and the other one ERROR:b2.bucket:error when uploading, upload_url was https://XXXXX Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/b2/bucket.py", line 415, in _upload_small_file content_type, HEX_DIGITS_AT_END, file_info, hashing_stream File "/usr/lib/python3.4/site-packages/b2/raw_api.py", line 497, in upload_file return self.b2_http.post_content_return_json(upload_url, headers, data_stream) File "/usr/lib/python3.4/site-packages/b2/b2http.py", line 275, in post_content_return_json response = _translate_and_retry(do_post, try_count, post_params) File "/usr/lib/python3.4/site-packages/b2/b2http.py", line 119, in _translate_and_retry return _translate_errors(fcn, post_params) File "/usr/lib/python3.4/site-packages/b2/b2http.py", line 55, in _translate_errors int(error['status']), error['code'], error['message'], post_params b2.exception.ServiceError: 503 service_unavailable XXXXXX is too busy

Dunno if this helps debugging the problem, I'm available to do tests of any kind if it helps fixing the problem ;)

andrea-sdl avatar Nov 15 '18 11:11 andrea-sdl

I don't have any obvious answers. I do have some questions...

How long does it run before failing?

Have you tried with different numbers of threads?

What's the XXX in 503 service_unavailable XXXXXX is too busy? (That might help track down a problem on the service side.)

bwbeach avatar Nov 15 '18 21:11 bwbeach

Hi @bwbeach thanks for the reply! I redid some test, so here are the answers to your question

  1. Before failing it waits a couple of minutes
  2. Tested with 200 thread first, then tried with 100 threads, the error appeared around minute 2'40s, with 50 threads it appears around 2'40' again ... with 25 threads the first error come up around 2'50s, (before the error a warning I copied below gets printed)
  3. the full error is "503 service_unavailable c002_v0001111_t0056 is too busy"

Some extra info Before the errors what gets printed is "WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: api002.backblazeb2.com"

in case of the 200/100 threads it gets printed lots of times (20+), in case of the 50 only once before starting outputting some errros.

Another error printed is b2.exception.B2ConnectionError: Connection error: HTTPSConnectionPool(host='pod-000-1111-11.backblaze.com', port=443): Pool is closed.

Lemme know if you need some extra data.

andrea-sdl avatar Nov 17 '18 20:11 andrea-sdl

WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: api002.backblazeb2.com is mostly harmless, it just means you are using lots of connections. Might affect performance a bit, but I doubt it will.

Pool is closed looks like a consequence of the 503 errors. I think we need a better retry policy or something.

ppolewicz avatar Nov 18 '18 21:11 ppolewicz

@ppolewicz got it, but isn't the 503 error also something server side? A better retry policy would make it more resistant to failure, but if the server doesn't stop throwing the errors we'll probably need something else, or am I missing something?

andrea-sdl avatar Nov 20 '18 10:11 andrea-sdl

if you have lots of small files to upload, you'd expect sync to take reasonable measures to complete the operation even if something fails for a few minutes. As it is now, the retries are greedy and per-file, so if one file cannot be uploaded a few times (5 iirc), then no further attempts at uploading it are made.

I think the retry policy should be more global. If we are uploading thousands of files and one of them would have to be retried 6 times, it should still be ok. To be blunt, the current code accepts retrying every file a few times, for thousands of retries total, but does not accept one file that needs to be retried more than 5 times.

An elegant retry policy is difficult to design, but I feel the current state has some room for improvement.

ppolewicz avatar Nov 20 '18 11:11 ppolewicz

for future references: With Restic the problem doesn't seem to come up. I suppose it's because it's packaging the files so therefore we're putting less stress on the B2 services.

Right now I'm going with that, but I would really hope to see the B2 client have some improvements to allow a classic copy of files for those users that don't need a full fledged backup/snapshot solution like restic/etc.

Edit: I couldn't uses rclone because it uses too much Class C operations. (adding just in case someone is searching for alternatives to fix the upload issue)

andrea-sdl avatar Nov 20 '18 14:11 andrea-sdl