blast_plus_docs icon indicating copy to clipboard operation
blast_plus_docs copied to clipboard

blastn 2.11.0 in Docker hangs phoning home

Open zwets opened this issue 4 years ago • 7 comments

My apologies if this is not the correct place to report this, but I would expect the issue to show up in this project too.

I am running blastn in a docker container, copying it in from the binary tarball from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+.

Since upgrading to 2.11.0, blastn calls take excessively long to finish, or do not complete at all.

Setting BLAST_USAGE_REPORT=false resolves the issue. This raises the strong suspicion that the new usage reporting feature is the culprit.

The issue can be reproduced by creating a docker image with a simple Dockerfile:

FROM ubuntu  # or your preferred starting image
COPY blastn /usr/local/bin
USER nobody:nogroup

After building the container with docker build -t test-bug ".", observe the difference between:

docker run -ti --rm --read-only -e BLAST_USAGE_REPORT=false test-bug blastn -help

and

docker run -ti --rm --read-only test-bug blastn -help

The hiccup is sub-second but already noticeable. Start a longer running local blastn call, and runtimes of normally e.g. 15s go up to many minutes, while top shows the processes as mostly sleeping.

Update for the record: the excessively long run times were not for single runs of blastn. They happened in our pipeline where we do a few dozen calls in series. These should take ~20s altogether, but their added up "hang time" made the job timeout after 20mn.

zwets avatar Dec 16 '20 00:12 zwets

Hello, Thank you for your report. We will try to reproduce this issue. From your description, I assume you are not on a cloud provider but rather on your own hardware.

Tom

tom6931 avatar Dec 16 '20 13:12 tom6931

@zwets - I was not able to reproduce this issue using the GCP Cloud Shell as described in this tutorial. After initiating the Cloud Shell, I was able to run the following commands successfully -

docker run --rm -e BLAST_USAGE_REPORT=false ncbi/blast blastn -help docker run --rm ncbi/blast blastn -help

If you simply copy the executable to the Docker image, there may be missing dependencies or cause other issues. I would 1) use the official NCBI BLAST image or 2) ftp the entire tar ball into the Docker container and unzip/build inside the container. Hope this helps.

stevetsa avatar Dec 16 '20 15:12 stevetsa

Thank you for getting back on this. I am running this on a laptop, so the issue really is with blastn rather than the BLAST docker image.

I wasn't initially able to reproduce the issue, until we had a network interruption (this is Africa). Sure enough: blastn -help under docker took over 20s. Running it straight on Linux wasn't quite as bad, but still close to 6s. With BLAST_USAGE_REPORT=false this goes down to the expected 0.00s.

I am currently pulling the ncbi/blast container to see if it has the same issue (but at 1GB that takes a while here). FTR, while it is pulling, blastn -help takes ~7s both inside and outside the container ...

Clearly, the default "on" setting for the usage reporting isn't great for parts of the world outside of the well-connected north, or indeed anyone running blastn on a disconnected machine, especially in docker.

I will report on what happens with the ncbi/blast image when I have it on my machine.

zwets avatar Dec 16 '20 21:12 zwets

thanks for the additional information. I had tried it on my laptop with the wifi turned off (months ago, pre-release) and didn't see a problem, but I didn't try it on a slower network (or one half a world away). I'll try the latest version on my laptop (with wifi off) again to see what happens in case something changed. I don't think that docker will be better since it just wraps the BLAST+ executables. I'll speak to our developers about whether we can do better in upcoming releases.

tom6931 avatar Dec 16 '20 22:12 tom6931

Hi Tom, here are the results for the ncbi/blast image on my laptop - and they don't look great :-/

$ time docker run --rm -e BLAST_USAGE_REPORT=false ncbi/blast blastn -help >/dev/null
real    0m0.870s

# With 'normal' network (~4Mbps, but DNS has high latency here) 
$ time docker run --rm ncbi/blast blastn -help >/dev/null
real    0m5.600s

# With network disconnected ... oops!
$ time docker run --rm ncbi/blast blastn -help >/dev/null
real    1m0.914s

I reran the last one a few times. It is close to a 1 minute wait every time.

zwets avatar Dec 16 '20 22:12 zwets

thanks. We'll need to look at that.

tom6931 avatar Dec 16 '20 22:12 tom6931

I suppose under docker the issue is worse because the containers sit on their own network, and won't see their link go down when the outside link is down (unless Docker would do this).

Add to that that resolving www.ncbi.nlm.nih.gov is very slow here (and occasionally fails with timeout), apparently due to the number of recursions and high latency. And with a TTL of only 30s it won't be cached, so this adds a few seconds to every blastn call.

Anyway, thanks for looking into this!

zwets avatar Dec 16 '20 23:12 zwets