twitter-sentiment-analysis icon indicating copy to clipboard operation
twitter-sentiment-analysis copied to clipboard

Tidy up preprocess.py with pandas

Open abdulfatir opened this issue 6 years ago • 8 comments

In preprocess_csv function in preprocess.py (link), pandas can be used to parse the csv more efficiently and with way less code. The machine I was using while developing the project did not have pandas installed.

abdulfatir avatar Dec 29 '17 21:12 abdulfatir

I run your codes,it happend [Errno 2] No such file or directory: '../train-processed-freqdist.pkl',can you solve my problem?Thank you

GongQin721 avatar Mar 29 '18 08:03 GongQin721

@GongQin721 This is off-topic. Please read the Readme properly.

abdulfatir avatar Mar 29 '18 12:03 abdulfatir

OK ,thank you very much!

GongQin721 avatar Apr 29 '18 07:04 GongQin721

Can you help me with headers of the csv, if any? If not, some idea about the structure of csv would be of great help.

chaiitanyasangani88 avatar Oct 18 '18 11:10 chaiitanyasangani88

Hi @chaiitanyasangani88

The csv structure is in the Dataset Information section:

We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.

abdulfatir avatar Oct 18 '18 14:10 abdulfatir

In your lstm.py code, these '.csv' and '.pkl' files are showed asFREQ_DIST_FILE = '../train-processed-freqdist.pkl' ,TRAIN_PROCESSED_FILE = '../train-processed.csv' and so on. I wonder how can I process these file from 'positive-words.txt' and 'negative-words.txt' in dataset. Could you please help me with problems above?

Carolinecrl avatar Nov 27 '18 08:11 Carolinecrl

@Carolinecrl

'positive-words.txt' and 'negative-words.txt' are not the dataset. They're just for the baseline. The dataset is not included in the repo.

abdulfatir avatar Nov 27 '18 09:11 abdulfatir

in stats.py which csv file should be sent train or test or any another sample (random )one.

16L31A0575n1 avatar Apr 07 '20 05:04 16L31A0575n1