backend icon indicating copy to clipboard operation
backend copied to clipboard

Stop words not recognized

Open AnissaPierre opened this issue 7 years ago • 3 comments

dan - Indonesian for "and" di - Italian for "of" في - Persian for "and" به- Persian for "to" در - Persian for "door" از - Persian "From" pada- Indonesian for "on" na - Bulgarian for "on" و - Persian for "and" é - Portuguese for "is"

AnissaPierre avatar Nov 07 '18 19:11 AnissaPierre

Note: these show up as some of the top words in our system if you search for everything since the beginning of time without using a language filter... not a huge priority but it does makes us look bad.

rahulbot avatar Nov 07 '18 20:11 rahulbot

Could you post a sample API query for the issue? Is it word cloud generation?

Also noting that we don’t support Persian, Bulgarian nor Indonesian.

On Thu, 8 Nov 2018 at 04:55 rahulbot [email protected] wrote:

Note: these show up as some of the top words in our system if you search for everything since the beginning of time without using a language filter... not a huge priority but it does makes us look bad.

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/berkmancenter/mediacloud/issues/513#issuecomment-436773747, or mute the thread https://github.com/notifications/unsubscribe-auth/AALGvaKvhvziEJGXceDLV02E8mNLI2xLks5us0ijgaJpZM4YTH5Q .

--

Linas Valiukas Media Cloud

pypt avatar Nov 08 '18 02:11 pypt

if we can detect persian, bulgarian, and indonesian, it's worth it to just quickly add some basic stopwords for those languages.

-hal

On Wed, Nov 7, 2018 at 8:46 PM Linas Valiukas [email protected] wrote:

Could you post a sample API query for the issue? Is it word cloud generation?

Also noting that we don’t support Persian, Bulgarian nor Indonesian.

On Thu, 8 Nov 2018 at 04:55 rahulbot [email protected] wrote:

Note: these show up as some of the top words in our system if you search for everything since the beginning of time without using a language filter... not a huge priority but it does makes us look bad.

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub < https://github.com/berkmancenter/mediacloud/issues/513#issuecomment-436773747 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AALGvaKvhvziEJGXceDLV02E8mNLI2xLks5us0ijgaJpZM4YTH5Q

.

--

Linas Valiukas Media Cloud

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/berkmancenter/mediacloud/issues/513#issuecomment-436854132, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvvT-3EI1r-CbYIK7AMANhPffxiTA7Kks5us5sNgaJpZM4YTH5Q .

-- Hal Roberts Fellow Berkman Klein Center for Internet & Society Harvard University

hroberts avatar Nov 08 '18 13:11 hroberts