awesome-django-blog
awesome-django-blog copied to clipboard
NLTK data is getting re-downloaded each new post
Context
It appears that the nltk data is getting re-downloaded (or at least attempted) each time a new post is created. I'm not sure why it's trying to download the data so many times?
INFO 2024-05-23 04:58:13,593 django.server basehttp basehttp.py 212 log_message "GET /post/new HTTP/1.1" 200 4431
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package stopwords to /home/david/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
INFO 2024-05-23 04:58:56,934 django.server basehttp basehttp.py 212 log_message "POST /post/new HTTP/1.1" 302 0
This is because I call the download function inside of the function that preprocesses text. Not desireable, so if you'd like to open a PR with a fix, go ahead!
def preprocess_text(text: str) -> str:
"""
Preprocess the input text by applying text preprocessing steps
"""
# Convert the text to lowercase
text = text.lower()
# Remove any leading or trailing whitespace
text = text.strip()
# Remove punctuation marks
text = text.translate(str.maketrans("", "", string.punctuation))
# Remove numbers
text = re.sub(r'\d+', '', text)
# Remove extra whitespaces
text = re.sub(r'\s+', ' ', text)
# Remove stopwords
nltk.download('stopwords') # This is triggering the download
stop_words = set(stopwords.words('english'))
text = ' '.join([word for word in text.split() if word not in stop_words])
return text
```