tldextract icon indicating copy to clipboard operation
tldextract copied to clipboard

Allow cache_file argument to also accept list

Open hibare opened this issue 5 years ago • 8 comments

Problem: We have an application, which perform TLD operation using celery workers. As we have a couple of celery workers, whenever cache_file update is called, it only updates the file in instances of the celery worker which picked up that task. So, there is content difference across all the celery instances.

If the tldextract can accept list as cache_file argument, essentially that list can be stored in redis and any worker can pick up easily.

hibare avatar Jul 23 '20 05:07 hibare

Relatedly, #144 changes the cache file name to a directory name.

john-kurkowski avatar Aug 04 '20 17:08 john-kurkowski

Can you say more why the list fixes your issue?

Is this a worker ops issue? Does this pseudocode work?

# Instead of …
all_celery_workers.enqueue(`tldextract --update`)

# Do this …
for worker in all_celery_workers:
  worker.exec(`tldextract --update`)

john-kurkowski avatar Aug 04 '20 17:08 john-kurkowski

This will not be efficient if a worker disconnects temporary or a new worker is added between the update gap time

hibare avatar Nov 06 '20 14:11 hibare

Ok, can you say more why the list fixes your issue? What would it look like?

john-kurkowski avatar Nov 06 '20 19:11 john-kurkowski

Assume there are 5 celery workers and we are updating list independent of this module. This provide the flexibility to cache the latest PSL data into a cache such as redis. We can pass list from cache to the module without worrying worker has the update

hibare avatar Nov 08 '20 07:11 hibare

To be clear, what would it look like?

john-kurkowski avatar Nov 09 '20 23:11 john-kurkowski

In the meantime, suffix_list_urls can be a local file. You could dump raw PSL text content from your Redis into a tempfile. Pass that tempfile path in suffix_list_urls.

john-kurkowski avatar Nov 09 '20 23:11 john-kurkowski

yeah, my current method is this

hibare avatar Nov 16 '20 05:11 hibare

Closing due to lack of response why the list fixes the issue / what it would look like

john-kurkowski avatar Dec 17 '22 06:12 john-kurkowski