collectfast icon indicating copy to clipboard operation
collectfast copied to clipboard

Fix boto3 pre_should_copy_hook closing the connection too soon

Open vschoener opened this issue 3 years ago • 5 comments

I've been testing collecfast for a while and I've spotted an issue with the connection being closed too soon.

After digging around and trying to understand conversations about thread-safe, I've checked about S3Boto3Storage and I can confirm it's thread-safe as well.

You can check the log here https://pypi.org/project/django-storages/ : 1.6.4 (2017-07-27) The conversation I found about it https://github.com/antonagestam/collectfast/issues/107

As S3Boto3Storage is safe to use with thread, we can simply drop the logic about resetting the connection. If you keep this logic, it closes the main connection of s3 client and some files are not getting synced at all. You have to run it multiple time to make it works.

I've tested locally by commenting off the code there, and it was working as expected https://github.com/antonagestam/collectfast/blob/master/collectfast/management/commands/collectstatic.py#L106

Here are my settings:

COLLECTFAST_ENABLED = os.getenv("COLLECTFAST_ENABLED") == "true"

# Default strategy for collectfast
COLLECTFAST_STRATEGY = os.getenv("COLLECTFAST_STRATEGY", "collectfast.strategies.filesystem.FileSystemStrategy")

# Define how much collectfast should store entries in the cache. We assume there are 2400 static file in the project
MAX_ENTRIES = int(os.getenv("MAX_ENTRIES", 300))

# Set which cache to use.
COLLECTFAST_CACHE = os.getenv("COLLECTFAST_CACHE", "default")

# Enable Parallel Uploads. Do not try to exceed this value with AWS. It will close the connection
COLLECTFAST_THREADS = int(os.getenv("COLLECTFAST_THREADS", 10))

COLLECTFAST_DEBUG = os.getenv("COLLECTFAST_DEBUG") == "true"

COLLECTFAST_REDIS_CACHE = {
    "BACKEND": "django_redis.cache.RedisCache",
    "LOCATION": os.environ.get("COLLECTFAST_REDIS_URL", "redis://redis/0"),
    "KEY_PREFIX": "collectfast",
    "OPTIONS": {
        "SOCKET_CONNECT_TIMEOUT": int(os.getenv("REDIS_SOCKET_CONNECT_TIMEOUT", default=2)),
        "SOCKET_TIMEOUT": int(os.getenv("REDIS_SOCKET_TIMEOUT", default=2)),
        "MAX_ENTRIES": MAX_ENTRIES,
        "IGNORE_EXCEPTIONS": os.getenv("COLLECTFAST_IGNORE_EXCEPTIONS") == "true",
    },
}

And my .env

STATICFILES_STORAGE="storages.backends.s3boto3.S3Boto3Storage"
COLLECTFAST_STRATEGY="collectfast.strategies.boto3.Boto3Strategy"
MAX_ENTRIES=3000
COLLECTFAST_ENABLED=true
COLLECTFAST_THREAD=20
COLLECTFAST_DEBUG=true
AWS_DEFAULT_ACL=private

COLLECTFAST_CACHE=collectfast_redis_cache
COLLECTFAST_REDIS_URL="redis://0.0.0.0:6379/0"
COLLECTFAST_IGNORE_EXCEPTIONS="true"

WDYT about that?

vschoener avatar Apr 21 '22 14:04 vschoener

Code Climate has analyzed commit 469da6c2 and detected 0 issues on this pull request.

View more on Code Climate.

qlty-cloud-legacy[bot] avatar Apr 21 '22 14:04 qlty-cloud-legacy[bot]

@antonagestam is it possible for you to take a look at this even if you don't maintain the repo anymore? :)

vschoener avatar Apr 22 '22 13:04 vschoener

Hi @vschoener and thanks for your contribution. This looks sound, I'm guessing boto3 has gained thread safety since this was first implemented.

However, I'm not really willing to merge anything until the repository is in a maintainable state again. That means having a passing test suite separated from my personal AWS and GCP accounts specifically (the current ones have expired). I'd be happy to guide anyone willing to step and make that happen along the way, and to review PRs that make that happen.

There's a previous attempt at that here, that is unfortunately to be regarded as abandoned at this point, probably: https://github.com/antonagestam/collectfast/pull/218

antonagestam avatar Apr 22 '22 13:04 antonagestam

Hey @antonagestam,

Thanks for replying to this PR. Really appreciate. I'm a bit running out of time to improve the test part, I would love to contribute deeper about that and make it easier to maintain but I also have some requirements to make threads work in a short amount of time.

Is that something really important? Maybe because the tests will fail as you don't have AWS/GCP account anymore right?

I've seen the PR in progress trying to mock things around, it seems abandoned indeed.

vschoener avatar Apr 22 '22 15:04 vschoener

Is that something really important?

Yes, I consider it fundamental.

but I also have some requirements to make threads work in a short amount of time.

I appreciate that, and your effort to contribute. I guess nothing should be stopping you from using your fork until things improve though, right?

antonagestam avatar Apr 22 '22 17:04 antonagestam