collectfast icon indicating copy to clipboard operation
collectfast copied to clipboard

Don't suppress errors when using threads

Open lepilepi opened this issue 4 years ago • 3 comments

When I was using collectfast in a CI system, some of the assets were not copied. Sometimes it copied all, sometimes just a subset without any errors.

This is what I experienced:

~ $ python manage.py collectstatic --noinput --disable-collectfast
1249 static files copied, 59 unmodified.
~ $
~ $ echo 'Deleting the files from AWS now'
Deleting the files from AWS now
~ $
~ $ python manage.py collectstatic --noinput
790 static files copied.
~ $ python manage.py collectstatic --noinput
518 static files copied.
~ $ python manage.py collectstatic --noinput
0 static files copied.
~ $ python manage.py collectstatic --noinput
0 static files copied.
~ $
~ $ echo 'Deleting the files from AWS now'
Deleting the files from AWS now
~ $
~ $ python manage.py collectstatic --noinput
697 static files copied.
~ $ python manage.py collectstatic --noinput
611 static files copied.
~ $ python manage.py collectstatic --noinput
0 static files copied.
~ $ python manage.py collectstatic --noinput
0 static files copied.
~ $
~ $

It turned out, that the redis cache I used has some connection limits, so in the background I was having redis.exceptions.ConnectionError: max number of clients reached errors without being notified.

The collectstatic build step succeeded all times while having these errors and not copying some files.

The proposed fix simply waits for all the results and re-raises any errors, so the collectstatic command will fail loudly when an error happens in at least one thread.

(I wanted to write a test for this, but wasn't able to get the test suite running so I gave it up.)

lepilepi avatar Oct 14 '21 10:10 lepilepi

@lepilepi Hmm, yeah looks like django-storages has removed .gzip from the S3 class. So that would have to be fixed in a separate PR first.

The expected behavior for failures should be to always fall back to the builtin collectstatic command instead of falling over. It sounds like something else has happened since your files weren't copied. Step 1 would probably be to reproduce that with a test.

I have limited time to work on this project at the moment and probably won't be able to help out much, but will review PRs with passing tests.

antonagestam avatar Oct 14 '21 12:10 antonagestam

I don't think anything else happened: I simply had redis.exceptions.ConnectionError errors in some of the threads, which was entirely swallowed by the ThreadPoolExecutor().map() so in the end there were no errors displayed, marked as success while missing several files. If any other error might happen in those threads. those will be masked too, resulting as success.

At the moment, I simply reduced the number of the threads in COLLECTFAST_THREADS, hoping with crossed fingers that we won't hit it again. If so, we won't even notice it during the build.

I don't have much time to work on this either unfortunately. If I could run the test in local properly, I could add one to reproduce this issue easily but I don't have the time to figure out and fix the already failing tests in master, neither to figure out the why I can't run the tests in local based on the README (and update that as well if needed). (Something might be with the dependency versions).

lepilepi avatar Oct 14 '21 14:10 lepilepi

which was entirely swallowed by the ThreadPoolExecutor().map() so in the end there were no errors displayed, marked as success while missing several files.

Yeah no I agree, I'm saying that's not to be expected. What would be expected is for those to have been retried using builtin collectstatic and whatever behavior is implemented there (which I assume is to fail loudly, but it's honestly too long ago for me to remember).

antonagestam avatar Oct 14 '21 14:10 antonagestam