textmineR icon indicating copy to clipboard operation
textmineR copied to clipboard

Use tidyverse style and testthat 3e

Open Matt-Int opened this issue 3 years ago • 7 comments

I could have sworn there was a project or similar on the main repo that said about making this more in-line with the tidyverse style. I may have imagined it, but in case that is still relevant then these changes are primarily formatting changes to the code, using a combination of styler and manual work (there was probably a cleverer way of doing this).

Major changes (i.e. not just formatting) include the change of user-facing function names to be closer to the tidyverse style, argument names have not been changed to maintain backwards compatibility. Aliases with the old function names have also been added to make sure that previous code won't suddenly break.

There may be something to add as.character(sys.call(sys.parent()))[1L])as.character(sys.call(sys.parent()))[1L]) in the call to check if the old name is being used and displaying a warning or message if that's the case.

~I'll keep this as a draft for now, in case the whole tidyverse style isn't relevant anymore though.~ Looks like the checks are run anyway so might as well not have it as a draft...

Matt-Int avatar Jul 03 '21 22:07 Matt-Int

Ah. So it’s that moved into a different (forthcoming) package. https://tommyjones/tidylda

Not opposed to adding more tidy in textmineR, but I do want to preserve the API’s backwards compatibility.

Once tidylda hits CRAN, I’m going to have textmineR wrap it for LDA.

Tommy On Jul 3, 2021, 6:22 PM -0400, Mattias @.***>, wrote:

I could have sworn there was a project or similar on the main repo that said about making this more in-line with the tidyverse style. I may have imagined it, but in case that is still relevant then these changes are primarily formatting changes to the code, using a combination of styler and manual work (there was probably a cleverer way of doing this). Major changes (i.e. not just formatting) include the change of user-facing function names to be closer to the tidyverse style, argument names have not been changed to maintain backwards compatibility. Aliases with the old function names have also been added to make sure that previous code won't suddenly break. There may be something to add as.character(sys.call(sys.parent()))[1L])as.character(sys.call(sys.parent()))[1L]) in the call to check if the old name is being used and displaying a warning or message if that's the case. I'll keep this as a draft for now, in case the whole tidyverse style isn't relevant anymore though. You can view, comment on, or merge this pull request online at: https://github.com/TommyJones/textmineR/pull/87 Commit Summary

• Updated tests and DESCRIPTION to use testthat 3e • Adding markdown to suggests • Changes made from running styler::style_pkg() • Updated corpus_functions to tidyverse style line limit • Updated corpus_functions to be snake case instead of camel case • Rebuild documentation • Updated evaluation_metrics to tidyverse styleguide • Changed calls to CalcProbCoherence and CalcTopicModelR2 calls • Updated topic_modeling_utilities to conform to tidyverse style • Built documentation • Updated examples to use new snake case variables • Updated internal call to Dtm2Tcm from camel to snake version • Fixed spelling misstake • Added @param ... to old camel case alias versions. • Removed commented out code • Rebuilt documentation • Removed some unneeded whitespace • Made tropic_modeling_core not exceed 80 characters • Renamed topic_modeling_core functions to be snake_case • Some more 80 character finicking with the documentation • Removed some commented code and removed some superflous whitespaces • Reformatted Matrix call to fit in the 80 characters. • Silencing some lintr errors. • Renamed R calls to C functions to be snake_case • Updated other_utilites to closer follow the tidyverse style • Updated calls to tm_parallel_apply to the new name • Updated distance_functions to be tidyverse compliant • Revert "Renamed R calls to C functions to be snake_case" • Removed aliases as .Deprecated doesn't allow the function to be... • Rebuilt documentation • Fixing the C calls to be back to orignal, tests now pass again. • Rebuilt documentation • Added examples to seperate files and linked them in the roxygen docs • Included textmineR:: specification to be more specific about... • Exceeded the 80 character limit, so reformatted this call • Moved examples folder into the R directory • Fixed call to Hellinger_cpp in distance_functions • Use the snake_case in the example for calc_hellinger_dist

File Changes

• M DESCRIPTION (6) • M NAMESPACE (22) • M R/corpus_functions.R (595) • M R/distance_functions.R (148) • M R/evaluation_metrics.R (528) • A R/examples/calc_gamma.R (8) • A R/examples/calc_hellinger_dist.R (6) • A R/examples/calc_js_divergence.R (6) • A R/examples/calc_likelihood.R (12) • A R/examples/calc_prob_coherence.R (6) • A R/examples/calc_topic_model_r2.R (13) • A R/examples/cluster_2_topic_model.R (10) • A R/examples/create_dtm.R (17) • A R/examples/create_tcm.R (17) • A R/examples/dtm_2_docs.R (11) • A R/examples/dtm_2_lexicon.R (9) • A R/examples/dtm_2_tcm.R (3) • A R/examples/fit_ctm_model.R (31) • A R/examples/fit_lda_model.R (23) • A R/examples/fit_lsa_model.R (14) • A R/examples/get_probable_terms.R (15) • A R/examples/get_top_terms.R (6) • A R/examples/label_topics.R (13) • A R/examples/posterior.lda_topic_model.R (9) • A R/examples/predict.ctm_topic_model.R (12) • A R/examples/predict.lda_topic_model.R (26) • A R/examples/predict.lsa_topic_model.R (15) • A R/examples/summarize_topics.R (3) • A R/examples/term_doc_freq.R (8) • A R/examples/tm_parallel_apply.R (5) • A R/examples/update.lda_topic_model.R (50) • M R/other_utilities.R (96) • M R/textmineR-deprecated.R (4) • M R/textmineR.R (2) • M R/topic_modeling_core.R (1369) • M R/topic_modeling_utilities.R (312) • R man/calc_gamma.Rd (29) • R man/calc_hellinger_dist.Rd (27) • R man/calc_js_divergence.Rd (27) • R man/calc_likelihood.Rd (41) • R man/calc_prob_coherence.Rd (20) • R man/calc_topic_model_r2.Rd (42) • R man/cluster_2_topic_model.Rd (26) • R man/create_dtm.Rd (87) • R man/create_tcm.Rd (104) • R man/dtm_2_docs.Rd (29) • R man/dtm_2_lexicon.Rd (23) • R man/dtm_2_tcm.Rd (18) • R man/fit_ctm_model.Rd (70) • R man/fit_lda_model.Rd (74) • R man/fit_lsa_model.Rd (28) • R man/get_probable_terms.Rd (36) • R man/get_top_terms.Rd (21) • R man/label_topics.Rd (22) • M man/posterior.lda_topic_model.Rd (19) • M man/predict.ctm_topic_model.Rd (15) • M man/predict.lda_topic_model.Rd (40) • M man/predict.lsa_topic_model.Rd (11) • R man/summarize_topics.Rd (47) • R man/term_doc_freq.Rd (18) • M man/textmineR.Rd (2) • R man/tm_parallel_apply.Rd (26) • M man/update.lda_topic_model.Rd (104) • M tests/spelling.R (9) • M tests/testthat.R (2) • M tests/testthat/test-corpus_functions.R (181) • M tests/testthat/test-distance_functions.R (50) • M tests/testthat/test-topic_modeling_core.R (497) • M tests/testthat/test-topic_modeling_utilities.R (54)

Patch Links:

• https://github.com/TommyJones/textmineR/pull/87.patch • https://github.com/TommyJones/textmineR/pull/87.diff

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

TommyJones avatar Jul 03 '21 22:07 TommyJones

Appreciate the contributions, BTW. You’re motivating me to write down plans to get them out of my head. 😅

Tommy On Jul 3, 2021, 6:22 PM -0400, Mattias @.***>, wrote:

I could have sworn there was a project or similar on the main repo that said about making this more in-line with the tidyverse style. I may have imagined it, but in case that is still relevant then these changes are primarily formatting changes to the code, using a combination of styler and manual work (there was probably a cleverer way of doing this). Major changes (i.e. not just formatting) include the change of user-facing function names to be closer to the tidyverse style, argument names have not been changed to maintain backwards compatibility. Aliases with the old function names have also been added to make sure that previous code won't suddenly break. There may be something to add as.character(sys.call(sys.parent()))[1L])as.character(sys.call(sys.parent()))[1L]) in the call to check if the old name is being used and displaying a warning or message if that's the case. I'll keep this as a draft for now, in case the whole tidyverse style isn't relevant anymore though. You can view, comment on, or merge this pull request online at: https://github.com/TommyJones/textmineR/pull/87 Commit Summary

• Updated tests and DESCRIPTION to use testthat 3e • Adding markdown to suggests • Changes made from running styler::style_pkg() • Updated corpus_functions to tidyverse style line limit • Updated corpus_functions to be snake case instead of camel case • Rebuild documentation • Updated evaluation_metrics to tidyverse styleguide • Changed calls to CalcProbCoherence and CalcTopicModelR2 calls • Updated topic_modeling_utilities to conform to tidyverse style • Built documentation • Updated examples to use new snake case variables • Updated internal call to Dtm2Tcm from camel to snake version • Fixed spelling misstake • Added @param ... to old camel case alias versions. • Removed commented out code • Rebuilt documentation • Removed some unneeded whitespace • Made tropic_modeling_core not exceed 80 characters • Renamed topic_modeling_core functions to be snake_case • Some more 80 character finicking with the documentation • Removed some commented code and removed some superflous whitespaces • Reformatted Matrix call to fit in the 80 characters. • Silencing some lintr errors. • Renamed R calls to C functions to be snake_case • Updated other_utilites to closer follow the tidyverse style • Updated calls to tm_parallel_apply to the new name • Updated distance_functions to be tidyverse compliant • Revert "Renamed R calls to C functions to be snake_case" • Removed aliases as .Deprecated doesn't allow the function to be... • Rebuilt documentation • Fixing the C calls to be back to orignal, tests now pass again. • Rebuilt documentation • Added examples to seperate files and linked them in the roxygen docs • Included textmineR:: specification to be more specific about... • Exceeded the 80 character limit, so reformatted this call • Moved examples folder into the R directory • Fixed call to Hellinger_cpp in distance_functions • Use the snake_case in the example for calc_hellinger_dist

File Changes

• M DESCRIPTION (6) • M NAMESPACE (22) • M R/corpus_functions.R (595) • M R/distance_functions.R (148) • M R/evaluation_metrics.R (528) • A R/examples/calc_gamma.R (8) • A R/examples/calc_hellinger_dist.R (6) • A R/examples/calc_js_divergence.R (6) • A R/examples/calc_likelihood.R (12) • A R/examples/calc_prob_coherence.R (6) • A R/examples/calc_topic_model_r2.R (13) • A R/examples/cluster_2_topic_model.R (10) • A R/examples/create_dtm.R (17) • A R/examples/create_tcm.R (17) • A R/examples/dtm_2_docs.R (11) • A R/examples/dtm_2_lexicon.R (9) • A R/examples/dtm_2_tcm.R (3) • A R/examples/fit_ctm_model.R (31) • A R/examples/fit_lda_model.R (23) • A R/examples/fit_lsa_model.R (14) • A R/examples/get_probable_terms.R (15) • A R/examples/get_top_terms.R (6) • A R/examples/label_topics.R (13) • A R/examples/posterior.lda_topic_model.R (9) • A R/examples/predict.ctm_topic_model.R (12) • A R/examples/predict.lda_topic_model.R (26) • A R/examples/predict.lsa_topic_model.R (15) • A R/examples/summarize_topics.R (3) • A R/examples/term_doc_freq.R (8) • A R/examples/tm_parallel_apply.R (5) • A R/examples/update.lda_topic_model.R (50) • M R/other_utilities.R (96) • M R/textmineR-deprecated.R (4) • M R/textmineR.R (2) • M R/topic_modeling_core.R (1369) • M R/topic_modeling_utilities.R (312) • R man/calc_gamma.Rd (29) • R man/calc_hellinger_dist.Rd (27) • R man/calc_js_divergence.Rd (27) • R man/calc_likelihood.Rd (41) • R man/calc_prob_coherence.Rd (20) • R man/calc_topic_model_r2.Rd (42) • R man/cluster_2_topic_model.Rd (26) • R man/create_dtm.Rd (87) • R man/create_tcm.Rd (104) • R man/dtm_2_docs.Rd (29) • R man/dtm_2_lexicon.Rd (23) • R man/dtm_2_tcm.Rd (18) • R man/fit_ctm_model.Rd (70) • R man/fit_lda_model.Rd (74) • R man/fit_lsa_model.Rd (28) • R man/get_probable_terms.Rd (36) • R man/get_top_terms.Rd (21) • R man/label_topics.Rd (22) • M man/posterior.lda_topic_model.Rd (19) • M man/predict.ctm_topic_model.Rd (15) • M man/predict.lda_topic_model.Rd (40) • M man/predict.lsa_topic_model.Rd (11) • R man/summarize_topics.Rd (47) • R man/term_doc_freq.Rd (18) • M man/textmineR.Rd (2) • R man/tm_parallel_apply.Rd (26) • M man/update.lda_topic_model.Rd (104) • M tests/spelling.R (9) • M tests/testthat.R (2) • M tests/testthat/test-corpus_functions.R (181) • M tests/testthat/test-distance_functions.R (50) • M tests/testthat/test-topic_modeling_core.R (497) • M tests/testthat/test-topic_modeling_utilities.R (54)

Patch Links:

• https://github.com/TommyJones/textmineR/pull/87.patch • https://github.com/TommyJones/textmineR/pull/87.diff

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

TommyJones avatar Jul 03 '21 22:07 TommyJones

Ah. So it’s that moved into a different (forthcoming) package. https://tommyjones/tidylda Not opposed to adding more tidy in textmineR, but I do want to preserve the API’s backwards compatibility. Once tidylda hits CRAN, I’m going to have textmineR wrap it for LDA.

Ah I see!

Looks like I got a bit carried away on this part 😄.

Appreciate the contributions, BTW. You’re motivating me to write down plans to get them out of my head. 😅

Happy to! It's a really useful package and probably the main reason I started using R for text analysis / mining and topic modelling, so glad I can give back a bit.

Looks like the checks passed so I'm gonna mark as ready for review; again, all previous API calls have been preserved, I did initially go down the route of marking them as .Deprecated but went with a simple alias instead, so no previous code should be adversely affected.

Matt-Int avatar Jul 03 '21 22:07 Matt-Int

I've been meaning to pull together a textmineR 4.0 branch. I think it's a good idea to put your two PRs in there. IMO, that's the best way forward. A major version update like that will require some telegraphing through incremental CRAN updates to give users time to prepare for major changes.

I'll try to do that this weekend. Realistically, it might take a little while. (Sorry I haven't gotten to your PRs yet.) This is a holiday weekend in the States and I'm getting ready for a couple weeks holiday. So I've been in a sprint at work to prepare. But I'll get there. I do appreciate the motivation to improve textmineR. :)

TommyJones avatar Jul 04 '21 03:07 TommyJones

I'm happy with them going into the v4 branch; the mvrsquared one is pretty non-invasive though, especially when compared to this one! 😉

Happy fourth of July btw and hope you enjoy your holiday!

Matt-Int avatar Jul 04 '21 10:07 Matt-Int

Ok. Back to work! Some news: tidylda is now on CRAN. So there is a clear path (I believe) for most of what I hope to do with a textmineR v4.0 release. I owe you a written down plan and will figure out how to move this PR and the mvrsquared PR to a v4.0 branch. Thanks again for the leg work you've put in!

TommyJones avatar Jul 19 '21 15:07 TommyJones

Woo! Welcome back, hope you had a good holiday :)

That sounds fantastic, guess I'll have to dive through tidylda now as well; been meaning to get more familiar with tidymodels as well so this is a great excuse.

Best, Mattias

Matt-Int avatar Jul 22 '21 09:07 Matt-Int