twitter-sentiment-analysis Tidy up preprocess.py with pandas

Tidy up preprocess.py with pandas

Open abdulfatir opened this issue 6 years ago • 8 comments

In preprocess_csv function in preprocess.py (link), pandas can be used to parse the csv more efficiently and with way less code. The machine I was using while developing the project did not have pandas installed.

Dec 29 '17 21:12 abdulfatir

I run your codes,it happend [Errno 2] No such file or directory: '../train-processed-freqdist.pkl',can you solve my problem?Thank you

Mar 29 '18 08:03 GongQin721

@GongQin721 This is off-topic. Please read the Readme properly.

Mar 29 '18 12:03 abdulfatir

OK ,thank you very much!

Apr 29 '18 07:04 GongQin721

Can you help me with headers of the csv, if any? If not, some idea about the structure of csv would be of great help.

Oct 18 '18 11:10 chaiitanyasangani88

Hi @chaiitanyasangani88

The csv structure is in the Dataset Information section:

We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.

Oct 18 '18 14:10 abdulfatir

In your lstm.py code, these '.csv' and '.pkl' files are showed asFREQ_DIST_FILE = '../train-processed-freqdist.pkl' ,TRAIN_PROCESSED_FILE = '../train-processed.csv' and so on. I wonder how can I process these file from 'positive-words.txt' and 'negative-words.txt' in dataset. Could you please help me with problems above?

Nov 27 '18 08:11 Carolinecrl

@Carolinecrl

'positive-words.txt' and 'negative-words.txt' are not the dataset. They're just for the baseline. The dataset is not included in the repo.

Nov 27 '18 09:11 abdulfatir

in stats.py which csv file should be sent train or test or any another sample (random )one.

Apr 07 '20 05:04 16L31A0575n1

twitter-sentiment-analysis twitter-sentiment-analysis copied to clipboard

Tidy up preprocess.py with pandas

twitter-sentiment-analysis
twitter-sentiment-analysis copied to clipboard