tidytext issues

Error in qr.lm(thetasims[, k], qx)

5

Hello Julia, I am learning a lot from your book and videos. Thank you. I am conducting a text mining analysis with topic modeling. While following your instructions, I have...

kangutsa

unnest_tokens on large corpus with limited RAM

1

Hello, Thank you for a wonderful tool. I have noticed that RAM consumption becomes the computational bottleneck when unnesting tokens from a large corpus, which gets exponentially worse as the...

steelcitysi

Some stop_words are not neutral

2

Some `stop_words` do not belong to the list. For example, there are some `stop_words` that are present in sentiment lexicons: - According to `onix` lexicon "good" is a stop_word. I...

aliaamiri

Any chance we can get parallel processing for n-grams for example?

2

Thanks for a great package by the way

jaymon0703

feature

Suggestion to add BM25 Score

8

I suggest to add a function to bind BM25 score *(which is based on a probabilistic term weighting model)*. It is useful in some cases as it gives control over:...

OmaymaS

feature

add functions for lexical diversity (MTLD)

I'd love to be able to calculate lexical diversity using tidy principles. I know Quanteda and koRpus already have these functions, but I'd prefer to do it the tidy way....

fedormyskin

feature

Example needed for tidy approach for stm modeling with covariates

3

In the current `tidytext` document explaining about [the tidy approach to `stm` object](https://juliasilge.github.io/tidytext/reference/stm_tidiers.html#examples), there is no specific example of how to add covariates. I wanted to try that out with...

jooyoungseo

documentation

Tuning number of topics in LDA K

2

Hi Julia! I'm big fan of the tidy text mining book, but it seems it does not have too much emphasis on how to tune the number of topics (K)...

qiushiyan

feature

enabling existing international nrc lexicon in get_sentiments()

3

Hi, I love learning tidytext but was a bit surprised to see that the get_sentiments() function does not allow to use the non-english translations included within the Nov 2017 nrc...

LeWaHe

feature

Adding support for latent semantic analysis

I think it would be fairly easy to add support for the lsa package to tidytext and broom. See example below. ```r # Put some docs in a vector library("dplyr")...

BobMuenchen

feature

tidytext
tidytext copied to clipboard

Metadata

Error in qr.lm(thetasims[, k], qx)

unnest_tokens on large corpus with limited RAM

Some stop_words are not neutral

Any chance we can get parallel processing for n-grams for example?

Suggestion to add BM25 Score

add functions for lexical diversity (MTLD)

Example needed for tidy approach for stm modeling with covariates

Tuning number of topics in LDA K

enabling existing international nrc lexicon in get_sentiments()

Adding support for latent semantic analysis

← Metadata

Owner

Metadata

tidytext tidytext copied to clipboard

Metadata

← Metadata

Owner

Metadata

tidytext
tidytext copied to clipboard