Allow cache_file argument to also accept list
Problem: We have an application, which perform TLD operation using celery workers. As we have a couple of celery workers, whenever cache_file update is called, it only updates the file in instances of the celery worker which picked up that task. So, there is content difference across all the celery instances.
If the tldextract can accept list as cache_file argument, essentially that list can be stored in redis and any worker can pick up easily.
Relatedly, #144 changes the cache file name to a directory name.
Can you say more why the list fixes your issue?
Is this a worker ops issue? Does this pseudocode work?
# Instead of …
all_celery_workers.enqueue(`tldextract --update`)
# Do this …
for worker in all_celery_workers:
worker.exec(`tldextract --update`)
This will not be efficient if a worker disconnects temporary or a new worker is added between the update gap time
Ok, can you say more why the list fixes your issue? What would it look like?
Assume there are 5 celery workers and we are updating list independent of this module. This provide the flexibility to cache the latest PSL data into a cache such as redis. We can pass list from cache to the module without worrying worker has the update
To be clear, what would it look like?
In the meantime, suffix_list_urls can be a local file. You could dump raw PSL text content from your Redis into a tempfile. Pass that tempfile path in suffix_list_urls.
yeah, my current method is this
Closing due to lack of response why the list fixes the issue / what it would look like