OKR
OKR copied to clipboard
Artifacts from cleaning?
Some sentences seem very weird: For example:
Boy Scouts ' ` perversion ' files set to be released travel
@rachelvov, What's the original tweet? @kleinay, do you know the tweet id?
this is the original tweet: 258960355655548930 20007 Boy Scouts' 'perversion' files set to be released #travel http://t.co/RrOpIc2Y
Seems that the cleaning process removed the #
On Tue, Aug 15, 2017 at 1:19 PM, Gabriel Stanovsky <[email protected]
wrote:
Assigned #18 https://github.com/vered1986/OKR/issues/18 to @rachelvov https://github.com/rachelvov.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vered1986/OKR/issues/18#event-1206230877, or mute the thread https://github.com/notifications/unsubscribe-auth/AP92_xbtINdu2TIbwezAufoAaOUJVfZjks5sYXChgaJpZM4O3a_T .
@OriShapira, maybe we can remove hashtags altogether.
I think the reason I removed only the "#" and not the word after it is that it sometimes appears in the middle of the tweet with a content word, like here: #Baghdad: 25 killed as car bomb targets #Iraq army recruits It's worth checking how often this happens.
Maybe it's applicable to remove hashtags at the end of tweets?
Who is the owner of the pre-processing step (cleaning the tweets)? @OriShapira ? we should perhaps integrate the pre-process code into the project?