newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Where to find and delete all articles?

Open steeljardas opened this issue 1 year ago • 7 comments

I am using Newspaper3k on around 20k articles, where would I need to go to delete all these articles that Newspaper3k is downloading?

steeljardas avatar Jan 06 '23 08:01 steeljardas

If memoize_articles is not set to False then Newspaper will cache the article's urls and associated data in your system's temp directory. Here are some details on this cache in my Newspaper3k Overview Document.

johnbumgarner avatar Jan 14 '23 17:01 johnbumgarner

I believe what @steeljardas is asking is how to delete the cache?

NiravJoshi33 avatar Dec 29 '23 16:12 NiravJoshi33

the cache folder is ANCHOR_DIRECTORY https://github.com/codelucas/newspaper/blob/f622011177f6c2e95e48d6076561e21c016f08c3/newspaper/settings.py#L48

normally it would be

/tmp/.newspaper_scraper/feed_category_cache

AndyTheFactory avatar Dec 29 '23 18:12 AndyTheFactory

Thanks @AndyTheFactory

NiravJoshi33 avatar Dec 30 '23 01:12 NiravJoshi33

@AndyTheFactory Yes, I agree that @steeljardas was looking for a way to delete all the memoize articles. The document that I mentioned contains information on the cache's location.

johnbumgarner avatar Dec 30 '23 14:12 johnbumgarner

@johnbumgarner I have read your very good documentation!

your great work inspired me to keep this software alive as a new package https://github.com/AndyTheFactory/newspaper4k

there were a lot of problems and bugs, but i have the sense it's moving in the right direction. I will release a new version pretty soon with a lot of fixes and improvements.

Have a very good new year! and many thanks for your great work!

AndyTheFactory avatar Dec 30 '23 21:12 AndyTheFactory

@AndyTheFactory Thanks. I will reference your fork in my document. You reference that newspaper3k was last updated in September 2020. The correct date is September 2018. That is the date of the last code push to PyPI. And you are correct there are a lot of bugs in the current code base. I started a new project called NewsHound, but never pushed the code, because someone wanted to use it commercially. They lost their funding and now I have to revisit the code. The issue that I have found with OpenSource projects is that everyone wants to use them, but few people will put the effort in help someone maintain a project. Good Luck with your fork...

johnbumgarner avatar Dec 31 '23 14:12 johnbumgarner