B2_Command_Line_Tool icon indicating copy to clipboard operation
B2_Command_Line_Tool copied to clipboard

B2 Sync Connection Pool Is Full

Open Myflag opened this issue 8 years ago • 31 comments

Hello,

I hope you are having a wonderful day.

I am having a problem with the B2 Sync program. I have been encountering this error with increased frequency, especially when uploading or download millions of files.

Message: WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: f001.backblazeb2.com dnload working3/999/1030.jpg

I have tried reducing my connection threads down to as little as 30 but I still encounter the issue. Additionally, I have noticed that the more files that are being synced, the worse the issue becomes. I eventually begin encountering these errors to the point they hold up the sync process for tens of minutes at a time. The errors will appear than the sync will continue...then the errors appear again.

If you need to sync large numbers of files, this is making the sync command borderline useless.

I will appreciate any and all assistance you can offer.

Blessings, Steven Dunn

PS: I have submitted this support request here as per BackBlaze support's recommendation.

Myflag avatar Feb 13 '17 20:02 Myflag

It may be that the message you are seeing is an indirect indicator of another issue.

Please run the sync with logs enabled. If you know how to do it, please collect data about the network transfer during an upload using tcpdump or wireshark and then use wireshark to plot troughput in time. Please also keep a ping running to a host in the LA area, while collecting the network trace.

ppolewicz avatar Feb 13 '17 23:02 ppolewicz

Ppolewicz,

I have enabled logging and started a new sync. I will update you when it is complete.

Blessings, Steven Dunn

Myflag avatar Feb 14 '17 19:02 Myflag

Ppolewicz,

I reviewed the log and noticed that these are the errors that are stopping the sync process. It happens more often the longer the sync progresses and more quickly if I use more than the default 10 threads.

2017-02-14 21:17:54,922 20766 139658897905408 b2.bucket ERROR error when uploading, upload_url was https://pod-000-1033-18.backblaze.com/b2api/v1/b2_upload_file/(censored) Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/b2/bucket.py", line 392, in _upload_small_file sha1_sum, file_info, input_stream File "build/bdist.linux-x86_64/egg/b2/raw_api.py", line 455, in upload_file return self.b2_http.post_content_return_json(upload_url, headers, data_stream) File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 259, in post_content_return_json response = _translate_and_retry(do_post, try_count, post_params) File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 103, in _translate_and_retry return _translate_errors(fcn, post_params) File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 52, in _translate_errors int(error['status']), error['code'], error['message'], post_params ServiceError: 503 service_unavailable c001_v0001033_t0020 is too busy

Another error I discovered...

2017-02-14 21:17:49,414 20766 139658864334592 b2.bucket ERROR error when uploading, upload_url was https://pod-000-1032-07.backblaze.com/b2api/v1/b2_upload_file/(censored) Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/b2/bucket.py", line 392, in _upload_small_file sha1_sum, file_info, input_stream File "build/bdist.linux-x86_64/egg/b2/raw_api.py", line 455, in upload_file return self.b2_http.post_content_return_json(upload_url, headers, data_stream) File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 259, in post_content_return_json response = _translate_and_retry(do_post, try_count, post_params) File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 103, in _translate_and_retry return _translate_errors(fcn, post_params) File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 75, in _translate_errors raise BrokenPipe() BrokenPipe: Broken pipe: unable to send entire request

Please advise. I really want to use this service to quickly backup and, if need be, restore millions of files.

Blessings, Steven Dunn

Myflag avatar Feb 14 '17 21:02 Myflag

These errors are both normal, an the command-line-tool should be retrying. If it's not retrying enough, we can fix it.

Did the sync give up on those files?

bwbeach avatar Feb 14 '17 21:02 bwbeach

Hello,

Thank you for the reply.

It appears to retry the files but it eventually jams the sync process until I have to restart it. It will start going dormant for tens of minutes at a time. The issue gets worse the longer it runs.

Although, if you run with only 10 threads, the default, it does not occur near as often...though that few threads does slow down the sync significantly when transferring large numbers of small files.

I will appreciate your continued assistance.

Blessings, Steven Dunn

On Feb 14, 2017, at 4:59 PM, Brian Beach [email protected] wrote:

These errors are both normal, an the command-line-tool should be retrying. If it's not retrying enough, we can fix it.

Did the sync give up on those files?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Myflag avatar Feb 14 '17 22:02 Myflag

Looking through the code, I think that the retry code could be improved. I'll try to get to it today or tomorrow.

Some errors can be retried by getting a new upload URL to upload to a different host, and then uploading there. In those cases, there's no point in sleeping and retrying with the same URL. The 503 error above is one such case.

The other factor that probably hit you today is that Tuesday is the day we deploy updated software to our storage pods, which causes a slightly higher number of 503 status codes. The deployment is all done for today, so it may work better if you try now.

bwbeach avatar Feb 14 '17 23:02 bwbeach

Steven, I think you have found a bug, but in order to confirm, I need to see the network trace. The data will be encrypted - but it is ok, I just need to see the flow of packets in time to understand why the sync jams up. Can you please collect the network trace?

ppolewicz avatar Feb 14 '17 23:02 ppolewicz

Hello,

Thank you for reaching out.

I am running CentOS on this server. If you can send me the command you need run, I will be glad to help.

Blessings, Steven Dunn

On Feb 14, 2017, at 6:14 PM, Paweł Polewicz [email protected] wrote:

Steven, I think you have found a bug, but in order to confirm, I need to see the network trace. The data will be encrypted - but it is ok, I just need to see the flow of packets in time to understand why the sync jams up. Can you please collect the network trace?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Myflag avatar Feb 15 '17 00:02 Myflag

Brian,

Thank you for the reply.

Also, it might help to know that these issues have been occurring for the last several days.

Any resolutions that can be applied will be sincerely appreciated.

Blessings, Steven Dunn

On Feb 14, 2017, at 6:03 PM, Brian Beach [email protected] wrote:

Looking through the code, I think that the retry code could be improved. I'll try to get to it today or tomorrow.

Some errors can be retried by getting a new upload URL to upload to a different host, and then uploading there. In those cases, there's no point in sleeping and retrying with the same URL. The 503 error above is one such case.

The other factor that probably hit you today is that Tuesday is the day we deploy updated software to our storage pods, which causes a slightly higher number of 503 status codes. The deployment is all done for today, so it may work better if you try now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Myflag avatar Feb 15 '17 00:02 Myflag

tcpdump -n -C20 -s 68 -W 10 -w issue_323.pcap net 162.244.56.0/21

It needs to be run from root and it will consume about 200MB of disk space. Please stop the tcpdump with ctrl+c at least a few minutes after the sync has hanged. If tcpdump fails with an error immediately after start, please remove the -s 68 parameter and try again.

ppolewicz avatar Feb 15 '17 01:02 ppolewicz

Pawel,

Thank you for the reply.

Will that command only log the traffic for B2 or all server traffic?

Blessings, Steven Dunn

On Feb 14, 2017, at 8:17 PM, Paweł Polewicz [email protected] wrote:

tcpdump -n -C20 -s 68 -W 10 -w issue_323.pcap net 162.244.56.0/21 or net 162.244.56.0/24

It needs to be run from root and it will consume about 200MB of disk space. Please stop the tcpdump with ctrl+c at least a few minutes after the sync has hanged. If tcpdump fails with an error immediately after start, please remove the -s 68 parameter and try again.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Myflag avatar Feb 15 '17 01:02 Myflag

The net 162.244.56.0/21 part makes it log only the traffic to the IP address space of Backblaze. B2 traffic is all encrypted. Even if it wasn't, -s 68 cuts the packet just enough for the analysis to be done, while not saving the packet payload. I really, really don't want to see your stuff.

ppolewicz avatar Feb 15 '17 01:02 ppolewicz

Ppolewicz,

I appreciate the clarification.

The command failed with the following errors.

[root@localhost ~]# tcpdump -n -C20 -s 68 -W 10 -w issue_323.pcap net 162.244.56.0/21
tcpdump: WARNING: SIOCGIFADDR: nflog: No such device
tcpdump: NFLOG link-layer type filtering not implemented

Please advise.

Blessings, Steven Dunn

Myflag avatar Feb 15 '17 02:02 Myflag

Try to change the command to

tcpdump -n -i any -C20 -s 68 -W 10 -w issue_323.pcap net 162.244.56.0/21

ppolewicz avatar Feb 15 '17 02:02 ppolewicz

Pawel,

I entered the command and received the following error:

tcpdump: issue_323.pcap0: Permission denied

Please advise.

Blessings,

Steven Dunn

From: Paweł Polewicz [mailto:[email protected]] Sent: Tuesday, February 14, 2017 9:48 PM To: Backblaze/B2_Command_Line_Tool [email protected] Cc: Myflag [email protected]; Author [email protected] Subject: Re: [Backblaze/B2_Command_Line_Tool] B2 Sync Connection Pool Is Full (#323)

Try to change the command to

tcpdump -n -i any -C20 -s 68 -W 10 -w issue_323.pcap net 162.244.56.0/21

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Backblaze/B2_Command_Line_Tool/issues/323#issuecomment-279902381 , or mute the thread https://github.com/notifications/unsubscribe-auth/ATZuKLJgeeU1GyVZ97pb1x1DrPuipvXkks5rcmdTgaJpZM4L_tUz . https://github.com/notifications/beacon/ATZuKOvbBqCJdKuei7DKp-BS-JqwrkbIks5rcmdTgaJpZM4L_tUz.gif

Myflag avatar Feb 15 '17 15:02 Myflag

It looks like you are running the tcpdump from a directory, to which you cannot write or you are not running it from root.

ppolewicz avatar Feb 15 '17 16:02 ppolewicz

@Myflag, is there anything I could do to help you collect the trace?

ppolewicz avatar Feb 17 '17 11:02 ppolewicz

Hello,

Thank you for reaching out.

The last time I ran the sync at 30 threads it did not jam up with the errors. It has now finished syncing all 3.8 million files and I don't have any others I need to push to B2 at the moment.

If I discover the issue occurring again, I will update this thread.

I appreciate all of your assistance.

Blessings, Steven Dunn

On Feb 17, 2017, at 6:00 AM, Paweł Polewicz [email protected] wrote:

@Myflag, is there anything I could do to help you collect the trace?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Myflag avatar Feb 17 '17 14:02 Myflag

Hi!

I'm running a AWS EC2 instance doing some backup to B2.

This night I got:

ERROR:b2.bucket:error when uploading, upload_url was https://pod-000-1042-07.backblaze.com/b2api/v1/b2_upload_file/************************/c001_v0001042_t0049
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/b2/bucket.py", line 394, in _upload_small_file
    HEX_DIGITS_AT_END, file_info, hashing_stream
  File "build/bdist.linux-x86_64/egg/b2/raw_api.py", line 458, in upload_file
    return self.b2_http.post_content_return_json(upload_url, headers, data_stream)
  File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 259, in post_content_return_json
    response = _translate_and_retry(do_post, try_count, post_params)
  File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 103, in _translate_and_retry
    return _translate_errors(fcn, post_params)
  File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 52, in _translate_errors
    int(error['status']), error['code'], error['message'], post_params
ServiceError: 503 service_unavailable c001_v0001042_t0049 is too busy
ERROR:b2.bucket:error when uploading, upload_url was https://pod-000-1038-14.backblaze.com/b2api/v1/b2_upload_file/*********************/c001_v0001038_t0001
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/b2/bucket.py", line 394, in _upload_small_file
    HEX_DIGITS_AT_END, file_info, hashing_stream
  File "build/bdist.linux-x86_64/egg/b2/raw_api.py", line 458, in upload_file
    return self.b2_http.post_content_return_json(upload_url, headers, data_stream)
  File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 259, in post_content_return_json
    response = _translate_and_retry(do_post, try_count, post_params)
  File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 103, in _translate_and_retry
    return _translate_errors(fcn, post_params)
  File "build/bdist.linux-x86_64/egg/b2/b2http.py", line 52, in _translate_errors
    int(error['status']), error['code'], error['message'], post_params
ServiceError: 503 service_unavailable Tome is overly busy

There is a lot more errors like this, some mention Tome is overly busy, some c001_v0001042_t0049, c001_v0001041_t0043, c001_v0001042_t0033, etc...

I believe this issue applies to my case. Is there anything we can do to fix this. I believe that this simply shouldn't happen.

Command line:

b2 sync --noProgress --delete --threads 1 /etc/nginx/ b2://MY_BUCKET_NAME/backup-nginx/etc/

Uname:

Linux cloud 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux

Thanks.

0x3333 avatar Jun 27 '17 11:06 0x3333

The "Tome is overly busy" message is a sign that the load balancing among the B2 servers at Backblaze is not working right. We'll do our best to get it fixed.

bwbeach avatar Jun 27 '17 12:06 bwbeach

This issue is still a problem when deleting a large number of files in the bucket. See output:

delete <file 1>
delete <file 2>
...
delete <file n>
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: api001.backblazeb2.com
delete <more files>                
upload <some file>                                                              
ERROR:b2.console_tool:ConsoleTool command error    
Traceback (most recent call last):                 
  File "/usr/local/Cellar/b2-tools/1.1.0/libexec/lib/python2.7/site-packages/b2/console_tool.py", line 1074, in run_command
    return command.run(args)                          
  File "/usr/local/Cellar/b2-tools/1.1.0/libexec/lib/python2.7/site-packages/b2/console_tool.py", line 857, in run
    allow_empty_source=allow_empty_source                                      
  File "/usr/local/Cellar/b2-tools/1.1.0/libexec/lib/python2.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs) 
  File "/usr/local/Cellar/b2-tools/1.1.0/libexec/lib/python2.7/site-packages/b2/sync/sync.py", line 273, in sync_folders
    raise CommandError('sync is incomplete')       
CommandError: sync is incomplete                   
ERROR: sync is incomplete    

Looks like an issue with that library - perhaps a handle isn't being released?

dschuler avatar Jun 02 '18 00:06 dschuler

Perhaps we should increase the size of the connection pool so that it is as large as our thread pool. @bwbeach, what do you think?

ppolewicz avatar Jun 02 '18 04:06 ppolewicz

I think that would make sense.

It's not clear to me that the connection pool warning is related to the sync failure. I think it's just a warning that the connection could not be returned to the pool because the pool was full.

bwbeach avatar Jun 02 '18 20:06 bwbeach

This happens after deleting more than 100k files, and this is the only warning just before a failure that terminates the process.

dschuler avatar Jun 04 '18 02:06 dschuler

I'm also receiving this issue.

WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f000.backblazeb2.com dnload myfile.sql.gz ERROR:b2.console_tool:ConsoleTool command error Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 1476, in run_command return command.run(args) File "/usr/local/lib/python3.5/dist-packages/b2/console_tool.py", line 1304, in run allow_empty_source=allow_empty_source File "/home/debian/.local/lib/python3.5/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper return function(*wrapee_args, **wrapee_kwargs) File "/home/debian/.local/lib/python3.5/site-packages/b2sdk/sync/sync.py", line 273, in sync_folders raise CommandError('sync is incomplete') b2sdk.exception.CommandError: sync is incomplete ERROR: sync is incomplete

rafatwork avatar Apr 05 '20 06:04 rafatwork

@rafatwork is that the full output? It looks like there was an error before the warnings that you didn't show. Can you run the sync with logs enabled?

ppolewicz avatar Apr 06 '20 23:04 ppolewicz

@ppolewicz @bwbeach Hello!

I am having the same problem with Connection Pools. I did not want to open up a new issue for this, so I am adding my comment here.

When I run b2 sync --threads 10 <bucket> <destination. I also receive the Connection Pool error.

I then have followed @ppolewicz instructions:

Can you run the sync with logs enabled?

When I run the sync with the --debugLogs flag, I receive these errors which are 'new' compared to running the sync without the flag --debugLogs.

image

It will not let me attach the file here, so here is a link to the log file with the errors which occur during indexing.

Here is a link to the log file (this is the same 'log' as the other shared file, it is larger because it ran longer) that has the errors which occur only when I include the flag --debugLogs. It does not include the error Connection Pool is full.

Error Image that corresponds with the log file: image


I have just run the sync command again without the flag --debugLogs, and as you can see now the 'connection pool is full' error appears:

image

Conclusion

I do not think that I can generate the log file for the 'Connection Pool is Full' until the errors above are fixed...

What do you think?


PS: All sync run's above have been ran with (10) threads active using the --threads flag (maybe spelled wrong I didn't look)

Currently running this with (1) thread active and it is very very slow (screenshot of current process with (1) thread is below). I will report the results (if / when) it finishes. I say 'if' because it is very slow. See screenshot:

image

jeffward01 avatar Feb 13 '22 07:02 jeffward01

@jeffward01 the connection pool being full warning (not error) doesn't have anything to do with your sync failure, which I think is actually caused by a filename decoding error. You have a few files with really weird names. The operating system told python those filenames can be decoded using encoding cp1252, but that didn't really work, and before anything is sent to B2, it needs to have its filename translated to utf-8 (something the CLI could not do). I think the amount of files in question is very limited, you might want to just try renaming them.

ppolewicz avatar Feb 13 '22 20:02 ppolewicz

@jeffward01 the connection pool being full warning (not error) doesn't have anything to do with your sync failure, which I think is actually caused by a filename decoding error. You have a few files with really weird names. The operating system told python those filenames can be decoded using encoding cp1252, but that didn't really work, and before anything is sent to B2, it needs to have its filename translated to utf-8 (something the CLI could not do). I think the amount of files in question is very limited, you might want to just try renaming them.

Thanks for getting back to me so quickly! The files are already hosted on B2 and I am pulling them down.

Is it possible to rename them remotely while they are on B2, then I can pull down the 'cleaned' files?

Thanks for the info!

Also, it would be nice to have a flag where if this case happens where the file is named strangely, that BackBlaze will replace it with some 'temp-name', then BackBlaze will create a file called renamed.txt with a legend of the new new-temp-names -> original file name.

Something like this would be great

jeffward01 avatar Feb 14 '22 06:02 jeffward01

Ok then it looks like you have an utf-8 encoded filename in B2 that cannot be translated to cp1252.

There is no atomic remote rename operation, but you can copy an object and remove the old one. Please see the manual for b2 copy-file-by-id and delete-file-version.

As for the feature you suggested, I agree that some sort of a solution would be nice, but I'm not sure if the one you suggested is right. renamed.txt would need to be at least a csv file, but then processing it with a script would be hard, so you'd need to do some programming to fix the situation. Ideally we'd have something that would allow the user to recover from such situations without any programming.

If you'd like to contribute a solution for this problem, please create a new github issue so that we can discuss it without polluting the "connection is full" issue any further :)

I think at the very least a detector that doesn't actually do anything would be nice - something that looks at your current filesystem encoding, looks at the other end and figures out which files and file versions will cause trouble when we are going to try to transfer them. Wouldn't actually transfer anything, would require a scan, but would allow the user to troubleshoot the situation with a script. Actually it could return a csv that could be then parsed with csvformat reliably, considering that B2 doesn't allow BELL or NEWLINE characters, but bash allows it in $IFS. It's a bit of a stretch though, a python solution would be way cleaner.

ppolewicz avatar Feb 14 '22 16:02 ppolewicz