Submitting LBDiscover package
Submitting Author Name: Chao Liu Submitting Author Github Handle: @chaoliu-cl Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/chaoliu-cl/LBDiscover Version submitted: Submission type: Standard Editor: TBD Reviewers: TBD
Archive: TBD Version accepted: TBD Language: en
- Paste the full DESCRIPTION file inside a code block below:
Package: LBDiscover
Title: Literature-Based Discovery Tools for Biomedical Research
Version: 0.1.0
Date: 2025-05-14
Authors@R:
person("Chao Liu", email = "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-9979-8272"))
Description: A suite of tools for literature-based discovery in biomedical research.
Provides functions for retrieving scientific articles from PubMed and
other NCBI databases, extracting biomedical entities (diseases, drugs, genes, etc.),
building co-occurrence networks, and applying various discovery models
including ABC, AnC, LSI, and BITOLA. The package also includes
visualization tools for exploring discovered connections.
License: GPL-3
URL: https://github.com/chaoliu-cl/LBDiscover, http://liu-chao.site/LBDiscover/, https://liu-chao.site/LBDiscover/
BugReports: https://github.com/chaoliu-cl/LBDiscover/issues
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Depends:
R (>= 4.0.0)
Imports:
httr (>= 1.4.0),
xml2 (>= 1.3.0),
igraph (>= 1.2.0),
Matrix (>= 1.3.0),
utils,
stats,
grDevices,
graphics,
tools,
rentrez (>= 1.2.0),
jsonlite (>= 1.7.0)
Suggests:
openxlsx (>= 4.2.0),
SnowballC (>= 0.7.0),
visNetwork (>= 2.1.0),
spacyr (>= 1.2.0),
parallel,
digest (>= 0.6.0),
irlba (>= 2.3.0),
knitr,
rmarkdown,
base64enc,
reticulate,
testthat (>= 3.0.0),
mockery,
covr,
htmltools
VignetteBuilder: knitr
Config/testthat/edition: 3
Scope
-
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [X] data retrieval
- [X] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [X] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] translation
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences): Data retrieval: The package provides functions for retrieving scientific articles from PubMed and other NCBI databases. It is a tool for systematically accessing biomedical literature from major research repositories. Data extraction: It extracts biomedical entities (diseases, drugs, genes, etc.) from retrieved literature, performing information extraction from scientific texts. Citation management and bibliometrics: The package builds co-occurrence networks from literature and applies discovery models (ABC, AnC, LSI, BITOLA) to find hidden connections between concepts, which represents bibliometric analysis for literature-based discovery research.
-
Who is the target audience and what are scientific applications of this package? Target Audience: LBDiscover is designed for biomedical researchers, bioinformaticians, and data scientists working in literature-based discovery (LBD). The primary users include:
-
Biomedical researchers seeking hidden connections between diseases, drugs, and genes
-
Pharmaceutical researchers exploring drug repurposing opportunities
-
Bioinformaticians building knowledge networks from literature
-
Graduate students and academics studying computational approaches to hypothesis generation
Scientific Applications: The package supports several key research applications:
- Drug Discovery and Repurposing: LBD has been used extensively in drug development and repurposing as well as predicting adverse drug reactions
- Disease-Gene Association Discovery: Using literature-based discovery to identify disease candidate genes
- Biomarker Identification: LBD has been explored as a tool to identify biomarkers for diagnostic and prognostic for diseases
- Hypothesis Generation: Creating testable scientific hypotheses by connecting disparate pieces of literature
- Knowledge Network Construction: Building co-occurrence networks to visualize research landscapes
- Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? There are several R packages that overlap with LBDiscover's functionality, but none provide the same comprehensive approach to literature-based discovery: Similar Packages and Key Differences:
-
pubmed.mineR Overlap: PubMed text mining with functions for data visualization and biomedical entity extraction Difference: Focuses on general text mining and clustering rather than implementing specific LBD models like ABC, AnC, LSI, and BITOLA
-
bibliometrix Overlap: Comprehensive science mapping analysis with network analysis capabilities and bibliometric workflows Difference: Designed for general scientometric analysis across all disciplines, not specifically for biomedical literature-based discovery or implementing LBD-specific algorithms
-
Data Retrieval Packages (rentrez, easyPubMed, RISmed) Overlap: All provide interfaces to NCBI/PubMed for retrieving biomedical literature Difference: These focus solely on data retrieval and don't perform LBD analysis, entity extraction, or hypothesis generation
How LBDiscover Meets Best-in-Category Criteria:
- Unique Functionality: LBDiscover is the first R package to specifically implement established LBD models:
- ABC Model: The most basic and widespread type of LBD centered around finding connections between concepts A, B, and C
- BITOLA: An interactive literature-based biomedical discovery support system using semantic prediction
- LSI (Latent Semantic Indexing): A statistical technique for improving information retrieval effectiveness used to assist in literature-based discoveries
- AnC Model: Advanced connection models for more sophisticated discovery patterns
- Integrated Workflow: Unlike other packages that handle only one aspect (retrieval OR analysis OR visualization), LBDiscover provides a complete workflow from data retrieval through entity extraction to discovery model application and network visualization.
- Biomedical Specialization: While bibliometrix serves general scientometrics and pubmed.mineR does general text mining, LBDiscover is specifically designed for biomedical literature-based discovery with domain-specific entity recognition (diseases, drugs, genes).
- Modern Implementation: Recent work has focused on integrating Large Language Models for enhancing Literature-Based Discovery processes, and LBDiscover appears positioned to incorporate such advances while maintaining established methodological foundations.
-
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? NA
-
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tagthe editor you contacted. -
Explain reasons for any
pkgcheckitems which your package is unable to pass. None
Technical checks
Confirm each of the following by checking the box.
- [X] I have read the rOpenSci packaging guide.
- [X] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.
This package:
- [X] does not violate the Terms of Service of any service it interacts with.
- [X] has a CRAN and OSI accepted license.
- [X] contains a README with instructions for installing the development version.
- [X] includes documentation with examples for all functions, created with roxygen2.
- [X] contains a vignette with examples of its essential functions and uses.
- [X] has a test suite.
- [X] has continuous integration, including reporting of test coverage.
Publication options
-
[X] Do you intend for this package to go on CRAN?
-
[ ] Do you intend for this package to go on Bioconductor?
-
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal.
- [ ] The manuscript describing the package is no longer than 3000 words.
- [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Code of conduct
- [X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.
:rocket:
Editor check started
:wave:
Checks for LBDiscover (v0.1.0)
git hash: 02f4c075
- :heavy_check_mark: Package is already on CRAN.
- :heavy_multiplication_x: does not have a 'codemeta.json' file.
- :heavy_multiplication_x: does not have a 'contributing' file.
- :heavy_check_mark: uses 'roxygen2'.
- :heavy_check_mark: 'DESCRIPTION' has a URL field.
- :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
- :heavy_check_mark: Package has at least one HTML vignette
- :heavy_multiplication_x: These functions do not have examples: [abc_model, anc_model, clear_pubmed_cache, create_report, eval_evidence, extract_entities_workflow, extract_entities, find_term, gen_report, get_dict_cache, get_term_vars, is_valid_biomedical_entity, load_dictionary, lsi_model, merge_entities, min_results, plot_heatmap, plot_network, prep_articles, query_external_api, query_mesh, query_umls, safe_diversify, sanitize_dictionary, valid_entities, validate_biomedical_entity, validate_entity_comprehensive, validate_entity_with_nlp].
- :heavy_check_mark: Package has continuous integration checks.
- :heavy_multiplication_x: Package coverage is 22.4% (should be at least 75%).
- :heavy_multiplication_x: All examples use
\dontrun{}. - :heavy_check_mark: R CMD check found no errors.
- :heavy_check_mark: R CMD check found no warnings.
- :eyes: Some goodpractice linters failed.
- :eyes: Function names are duplicated in other packages
Important: All failing checks above must be addressed prior to proceeding
(Checks marked with :eyes: may be optionally addressed.)
Package License: GPL-3
1. Package Dependencies
Details of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
| type | package | ncalls |
|---|---|---|
| internal | base | 2214 |
| internal | LBDiscover | 147 |
| internal | methods | 9 |
| imports | stats | 61 |
| imports | graphics | 58 |
| imports | xml2 | 54 |
| imports | utils | 51 |
| imports | httr | 33 |
| imports | igraph | 19 |
| imports | rentrez | 9 |
| imports | Matrix | 8 |
| imports | tools | 3 |
| imports | grDevices | 2 |
| imports | jsonlite | 2 |
| suggests | visNetwork | 8 |
| suggests | parallel | 7 |
| suggests | irlba | 2 |
| suggests | reticulate | 2 |
| suggests | SnowballC | 1 |
| suggests | spacyr | 1 |
| suggests | digest | 1 |
| suggests | openxlsx | NA |
| suggests | knitr | NA |
| suggests | rmarkdown | NA |
| suggests | base64enc | NA |
| suggests | testthat | NA |
| suggests | mockery | NA |
| suggests | covr | NA |
| suggests | htmltools | NA |
| linking_to | NA | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
base
c (195), character (171), for (126), length (121), data.frame (115), nrow (112), sapply (111), min (83), list (73), max (72), grepl (55), any (54), unique (53), numeric (50), names (39), which (34), paste0 (31), sum (30), if (29), integer (25), rep (25), return (23), seq_along (22), unlist (22), attr (21), ncol (20), paste (19), tryCatch (19), rbind (18), tolower (18), is.na (17), is.null (16), matrix (16), ceiling (15), lapply (15), rownames (15), strsplit (15), table (15), colnames (14), nchar (13), as.numeric (12), seq_len (11), vector (11), match (9), order (9), round (9), regexpr (8), regmatches (8), setdiff (8), as.character (7), drop (7), gregexpr (7), ifelse (7), rowSums (7), sqrt (7), url (7), body (6), plot (6), range (6), sort (6), dim (5), gsub (5), substr (5), t (5), diff (4), grep (4), seq (4), sprintf (4), switch (4), col (3), diag (3), emptyenv (3), environment (3), file (3), log (3), logical (3), new.env (3), row (3), tapply (3), tempfile (3), all (2), apply (2), by (2), colSums (2), dimnames (2), do.call (2), mean (2), outer (2), row.names (2), sub (2), Sys.time (2), try (2), abs (1), as.data.frame (1), cat (1), colMeans (1), difftime (1), duplicated (1), expression (1), file.path (1), floor (1), format (1), interactive (1), match.arg (1), merge (1), mode (1), packageEvent (1), setHook (1), suppressMessages (1), system.file (1), units (1), unname (1), version (1), which.max (1)
LBDiscover
retry_api_call (16), create_comat (4), load_dictionary (4), pubmed_search (4), string_similarity (4), throttle_api (4), abc_model (3), authenticate_umls (3), cluster_docs (3), count_corpus_terms (3), extract_entities (3), get_pubmed_cache (3), tokenize_text (3), vec_preprocess (3), calc_doc_sim (2), calculate_score (2), create_cache_key (2), create_dummy_dictionary (2), create_term_document_matrix (2), diversify_abc (2), extract_text_ngrams (2), get_color_palette (2), get_dict_cache (2), get_service_ticket (2), is_valid_biomedical_entity (2), load_dict_single (2), load_from_mesh (2), load_from_umls (2), load_mesh_terms_from_pubmed (2), process_mesh_xml (2), abc_model_opt (1), abc_model_sig (1), abc_timeslice (1), add_statistical_significance (1), alternative_validation (1), anc_model (1), apply_bitola_flexible (1), apply_correction (1), b_term_type_filter (1), bitola_model (1), calc_bibliometrics (1), clear_pubmed_cache (1), compare_terms (1), create_citation_net (1), create_report (1), create_single_heatmap (1), create_sparse_comat (1), create_tdm (1), create_vis_heatmap (1), detect_lang (1), diversify_b_terms (1), diversify_c_paths (1), enhance_abc_kb (1), eval_evidence (1), export_chord (1), export_chord_diagram (1), export_network (1), extract_entities_workflow (1), extract_mesh_from_text (1), extract_ner (1), extract_ngrams (1), extract_terms (1), extract_topics (1), fetch_and_parse_gene (1), fetch_and_parse_pmc (1), fetch_and_parse_protein (1), fetch_and_parse_pubmed (1), filter_by_type (1), filter_terms_for_abc_model (1), find_abc_all (1), find_similar_docs (1), find_term (1), gen_report (1), get_pmc_fulltext (1), get_term_vars (1), get_type_dist (1), get_umls_semantic_types (1), is_valid_type (1), list_to_df (1), load_results (1), parse_pubmed_xml (1), preprocess_text (1), process_batch (1), split_into_sentences (1), split_text (1)
stats
df (19), terms (16), p.adjust (5), phyper (4), kmeans (3), profile (3), aggregate (2), runif (2), setNames (2), smooth (2), complete.cases (1), dist (1), pt (1)
graphics
text (29), par (13), title (8), layout (6), arrows (2)
xml2
xml_find_first (19), xml_text (19), xml_find_all (10), read_xml (4), xml_attr (1), xml_name (1)
utils
txtProgressBar (40), read.csv (4), adist (2), write.csv (2), de (1), head (1), URLencode (1)
httr
content (18), GET (8), POST (5), headers (2)
igraph
graph_from_data_frame (12), layout_with_fr (6), degree (1)
methods
new (9)
rentrez
entrez_link (3), entrez_search (3), entrez_fetch (2), entrez_summary (1)
Matrix
t (4), diag (2), sparseMatrix (2)
visNetwork
visEdges (2), visGroups (2), visNetwork (2), visLayout (1), visSave (1)
parallel
clusterExport (3), parLapply (2), detectCores (1), makeCluster (1)
tools
file_ext (3)
grDevices
colorRampPalette (1), rainbow (1)
irlba
irlba (2)
jsonlite
fromJSON (2)
reticulate
import (2)
digest
digest (1)
SnowballC
wordStem (1)
spacyr
spacy_parse (1)
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in R (100% in 13 files) and
- 1 authors
- 3 vignettes
- no internal data file
- 11 imported packages
- 105 exported functions (median 47 lines of code)
- 146 non-exported functions in R (median 48 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:
loc= "Lines of Code"fn= "function"exp/not_exp= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function
The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
| measure | value | percentile | noteworthy |
|---|---|---|---|
| files_R | 13 | 65.8 | |
| files_inst | 4 | 97.1 | |
| files_vignettes | 3 | 89.3 | |
| files_tests | 9 | 84.9 | |
| loc_R | 8759 | 97.6 | TRUE |
| loc_inst | 991 | 77.5 | |
| loc_vignettes | 925 | 88.8 | |
| loc_tests | 1294 | 86.6 | |
| num_vignettes | 3 | 91.0 | |
| n_fns_r | 251 | 91.3 | |
| n_fns_r_exported | 105 | 95.3 | TRUE |
| n_fns_r_not_exported | 146 | 88.0 | |
| n_fns_per_file_r | 9 | 87.6 | |
| num_params_per_fn | 4 | 51.1 | |
| loc_per_fn_r | 48 | 88.7 | |
| loc_per_fn_r_exp | 47 | 77.7 | |
| loc_per_fn_r_not_exp | 48 | 89.6 | |
| rel_whitespace_R | 24 | 98.5 | TRUE |
| rel_whitespace_inst | 23 | 81.5 | |
| rel_whitespace_vignettes | 25 | 85.1 | |
| rel_whitespace_tests | 31 | 91.6 | |
| doclines_per_fn_exp | 20 | 15.3 | |
| doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
| fn_call_network_size | 157 | 84.3 |
2a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
3. goodpractice and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
| id | name | conclusion | sha | run_number | date |
|---|---|---|---|---|---|
| 17965843714 | pages build and deployment | success | 04c683 | 7 | 2025-09-24 |
| 17965695551 | pkgdown.yaml | success | a008bc | 4 | 2025-09-24 |
| 17965695558 | R-CMD-check.yaml | success | a008bc | 4 | 2025-09-24 |
3b. goodpractice results
R CMD check with rcmdcheck
R CMD check generated the following check_fails:
- cyclocomp
- no_description_date
Test coverage with covr
Package coverage: 22.42
The following files are not completely covered by tests:
| file | coverage |
|---|---|
| R/abc_model.R | 22.9% |
| R/comprehensive_summary.R | 0% |
| R/heatmap_visualization.R | 6.86% |
| R/performance_optimalization.R | 19.75% |
| R/pubmed_search.R | 0% |
| R/queries.R | 15.5% |
| R/text_preprocessing.R | 0% |
| R/utils.R | 1.93% |
| R/visualization.R | 49.63% |
| R/zzz.R | 10% |
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
| function | cyclocomplexity |
|---|---|
| is_valid_biomedical_entity | 161 |
| extract_entities_workflow | 145 |
| abc_model | 129 |
| sanitize_dictionary | 100 |
| vis_heatmap | 99 |
| vis_network | 77 |
| load_from_umls | 71 |
| validate_entity_with_nlp | 57 |
| extract_entities | 54 |
| pubmed_search | 46 |
| load_from_mesh | 45 |
| parse_pubmed_xml | 45 |
| create_comat | 43 |
| create_report | 43 |
| run_lbd | 41 |
| anc_model | 38 |
| load_dictionary | 37 |
| extract_ner | 35 |
| vis_abc_heatmap | 35 |
| export_chord_diagram | 33 |
| process_mesh_xml | 32 |
| validate_abc | 29 |
| abc_model_sig | 27 |
| lsi_model | 27 |
| abc_timeslice | 26 |
| map_ontology | 26 |
| shadowtext | 26 |
| abc_model_opt | 24 |
| eval_evidence | 24 |
| process_mesh_chunks | 24 |
| export_network | 23 |
| query_umls | 23 |
| apply_bitola_flexible | 22 |
| merge_entities | 21 |
| vis_abc_network | 21 |
| get_pmc_fulltext | 20 |
| validate_entity_comprehensive | 20 |
| vec_preprocess | 20 |
| bitola_model | 19 |
| create_sparse_comat | 19 |
| fetch_and_parse_pmc | 19 |
| find_abc_all | 19 |
| ncbi_search | 19 |
| cluster_docs | 18 |
| create_citation_net | 17 |
| load_mesh_terms_from_pubmed | 17 |
| create_tdm | 16 |
| create_term_document_matrix | 16 |
| extract_topics | 16 |
| preprocess_text | 16 |
| compare_terms | 15 |
| min_results | 15 |
Static code analyses with lintr
lintr found no issues with this package!
4. Other Checks
Details of other checks (click to open)
:heavy_multiplication_x: The following 10 function names are duplicated in other packages:
-
create_reportfrom DataExplorer, prodigenr, reporter
-
extract_entitiesfrom medExtractR
-
load_dictionaryfrom ricu
-
merge_resultsfrom climwin
-
ncbi_searchfrom taxize
-
parallel_analysisfrom kim
-
plot_heatmapfrom dendroTools, dynplot, greatR, MitoHEAR, omu, Plasmidprofiler, RolWinMulCor, romic
-
plot_networkfrom cape, dbnR, HeteroGGM, immcp, imsig, LSVAR, SeqNet, SubgrPlots
-
save_resultsfrom data.validator
-
vis_heatmapfrom immunarch
Package Versions
| package | version |
|---|---|
| pkgstats | 0.2.0.66 |
| pkgcheck | 0.1.2.230 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
Thanks for the submission @chaoliu-cl ! The package sounds really neat. Let me know when you've been able to address the✖️ found in the check.
I'd be a little concerned about those high complexity values found in the goodpractices checks. Those files and functions are huge and look like there are logical places you could split up the code. One example might be to put all of the static lists in a sysdata.rda file.
Hi @ldecicco-USGS ,
Thank you for the feedback. I have addressed the highlighted issues including the following:
- Code Complexity & sysdata.rda Following your suggestion, I've extracted all static lists into a sysdata.rda file (acronym corrections, term mappings, common words, entity patterns, etc.). This significantly reduced function complexity by removing hundreds of lines of static definitions.
- codemeta.json & CONTRIBUTING.md Both files have been added.
- Function Examples Added runnable examples (without \dontrun{}) for all previously undocumented functions.
- Test Coverage Improved from 22.4% to 75%.
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
Editor check started
:wave:
Checks for LBDiscover (v0.1.0)
git hash: 60e965ad
- :heavy_check_mark: Package is already on CRAN.
- :heavy_check_mark: has a 'codemeta.json' file.
- :heavy_check_mark: has a 'contributing' file.
- :heavy_check_mark: uses 'roxygen2'.
- :heavy_check_mark: 'DESCRIPTION' has a URL field.
- :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
- :heavy_check_mark: Package has at least one HTML vignette
- :heavy_multiplication_x: These functions do not have examples: [anc_model, create_report, lsi_model, query_external_api, query_mesh, query_umls, validate_biomedical_entity, validate_entity_comprehensive, validate_entity_with_nlp].
- :heavy_check_mark: Package has continuous integration checks.
- :heavy_check_mark: Package coverage is 75%.
- :heavy_check_mark: R CMD check found no errors.
- :heavy_check_mark: R CMD check found no warnings.
- :eyes: Some goodpractice linters failed.
- :eyes: Function names are duplicated in other packages
- :eyes: Examples should not use
\dontrun{}unless really necessary.
Important: All failing checks above must be addressed prior to proceeding
(Checks marked with :eyes: may be optionally addressed.)
Package License: GPL-3
1. Package Dependencies
Details of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
| type | package | ncalls |
|---|---|---|
| internal | base | 2234 |
| internal | LBDiscover | 149 |
| internal | methods | 9 |
| internal | usethis | 2 |
| imports | stats | 61 |
| imports | graphics | 58 |
| imports | xml2 | 54 |
| imports | utils | 51 |
| imports | httr | 33 |
| imports | igraph | 19 |
| imports | rentrez | 9 |
| imports | Matrix | 8 |
| imports | tools | 3 |
| imports | grDevices | 2 |
| imports | jsonlite | 2 |
| suggests | visNetwork | 8 |
| suggests | parallel | 7 |
| suggests | irlba | 2 |
| suggests | reticulate | 2 |
| suggests | SnowballC | 1 |
| suggests | spacyr | 1 |
| suggests | digest | 1 |
| suggests | openxlsx | NA |
| suggests | knitr | NA |
| suggests | rmarkdown | NA |
| suggests | base64enc | NA |
| suggests | testthat | NA |
| suggests | mockery | NA |
| suggests | covr | NA |
| suggests | withr | NA |
| suggests | htmltools | NA |
| linking_to | NA | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
base
character (195), c (192), data.frame (146), for (125), length (124), nrow (112), list (82), sapply (77), min (76), max (68), rep (64), numeric (54), unique (52), paste0 (48), names (39), which (34), return (33), if (30), sum (30), integer (25), grepl (24), seq_along (22), unlist (22), attr (21), any (20), paste (19), tolower (19), tryCatch (19), ncol (18), rbind (18), is.na (17), is.null (16), matrix (16), ceiling (15), lapply (15), rownames (15), strsplit (15), table (15), colnames (14), nchar (13), as.numeric (12), vector (11), match (9), order (9), round (9), seq_len (9), regexpr (8), regmatches (8), setdiff (8), as.character (7), drop (7), gregexpr (7), ifelse (7), rowSums (7), sqrt (7), url (7), body (6), plot (6), range (6), sort (6), t (6), substr (5), diff (4), dim (4), grep (4), gsub (4), seq (4), sprintf (4), switch (4), col (3), diag (3), emptyenv (3), environment (3), file (3), log (3), logical (3), new.env (3), row (3), tapply (3), tempfile (3), all (2), apply (2), by (2), cat (2), colSums (2), dimnames (2), do.call (2), expression (2), mean (2), outer (2), row.names (2), sub (2), Sys.time (2), try (2), abs (1), as.data.frame (1), colMeans (1), difftime (1), duplicated (1), file.path (1), floor (1), format (1), interactive (1), match.arg (1), merge (1), mode (1), rank (1), suppressMessages (1), system.file (1), units (1), unname (1), version (1), which.max (1)
LBDiscover
retry_api_call (16), create_comat (4), load_dictionary (4), pubmed_search (4), string_similarity (4), throttle_api (4), abc_model (3), authenticate_umls (3), cluster_docs (3), count_corpus_terms (3), extract_entities (3), get_pubmed_cache (3), tokenize_text (3), vec_preprocess (3), calc_doc_sim (2), calculate_score (2), create_cache_key (2), create_dummy_dictionary (2), create_term_document_matrix (2), diversify_abc (2), extract_text_ngrams (2), get_color_palette (2), get_dict_cache (2), get_service_ticket (2), is_valid_biomedical_entity (2), load_dict_single (2), load_from_mesh (2), load_from_umls (2), load_mesh_terms_from_pubmed (2), process_mesh_xml (2), abc_model_opt (1), abc_model_sig (1), abc_timeslice (1), add_statistical_significance (1), alternative_validation (1), anc_model (1), apply_bitola_flexible (1), apply_correction (1), b_term_type_filter (1), bitola_model (1), calc_bibliometrics (1), clear_pubmed_cache (1), compare_terms (1), create_citation_net (1), create_report (1), create_single_heatmap (1), create_sparse_comat (1), create_tdm (1), create_vis_heatmap (1), detect_lang (1), diversify_b_terms (1), diversify_c_paths (1), enhance_abc_kb (1), eval_evidence (1), export_chord (1), export_chord_diagram (1), export_network (1), extract_entities_workflow (1), extract_mesh_from_text (1), extract_ner (1), extract_ngrams (1), extract_terms (1), extract_topics (1), fetch_and_parse_gene (1), fetch_and_parse_pmc (1), fetch_and_parse_protein (1), fetch_and_parse_pubmed (1), filter_by_type (1), filter_terms_for_abc_model (1), find_abc_all (1), find_similar_docs (1), find_term (1), gen_report (1), get_pmc_fulltext (1), get_term_vars (1), get_type_dist (1), get_umls_semantic_types (1), has_general_biomedical_characteristics (1), is_valid_type (1), list_to_df (1), load_results (1), lsi_model (1), parse_pubmed_xml (1), preprocess_text (1), process_batch (1), split_into_sentences (1), split_text (1)
stats
df (19), terms (16), p.adjust (5), phyper (4), kmeans (3), profile (3), aggregate (2), runif (2), setNames (2), smooth (2), complete.cases (1), dist (1), pt (1)
graphics
text (29), par (13), title (8), layout (6), arrows (2)
xml2
xml_find_first (19), xml_text (19), xml_find_all (10), read_xml (4), xml_attr (1), xml_name (1)
utils
txtProgressBar (40), read.csv (4), adist (2), write.csv (2), de (1), head (1), URLencode (1)
httr
content (18), GET (8), POST (5), headers (2)
igraph
graph_from_data_frame (12), layout_with_fr (6), degree (1)
methods
new (9)
rentrez
entrez_link (3), entrez_search (3), entrez_fetch (2), entrez_summary (1)
Matrix
t (4), diag (2), sparseMatrix (2)
visNetwork
visEdges (2), visGroups (2), visNetwork (2), visLayout (1), visSave (1)
parallel
clusterExport (3), parLapply (2), detectCores (1), makeCluster (1)
tools
file_ext (3)
grDevices
colorRampPalette (1), rainbow (1)
irlba
irlba (2)
jsonlite
fromJSON (2)
reticulate
import (2)
usethis
use_data (2)
digest
digest (1)
SnowballC
wordStem (1)
spacyr
spacy_parse (1)
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in R (100% in 14 files) and
- 1 authors
- 3 vignettes
- no internal data file
- 11 imported packages
- 107 exported functions (median 46 lines of code)
- 150 non-exported functions in R (median 49 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:
loc= "Lines of Code"fn= "function"exp/not_exp= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function
The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
| measure | value | percentile | noteworthy |
|---|---|---|---|
| files_R | 14 | 68.5 | |
| files_inst | 4 | 97.0 | |
| files_vignettes | 3 | 89.3 | |
| files_tests | 28 | 96.9 | |
| loc_R | 8267 | 97.3 | TRUE |
| loc_inst | 991 | 77.7 | |
| loc_vignettes | 946 | 89.1 | |
| loc_tests | 9585 | 99.1 | TRUE |
| num_vignettes | 3 | 91.0 | |
| n_fns_r | 257 | 91.5 | |
| n_fns_r_exported | 107 | 95.4 | TRUE |
| n_fns_r_not_exported | 150 | 88.4 | |
| n_fns_per_file_r | 9 | 86.5 | |
| num_params_per_fn | 4 | 51.2 | |
| loc_per_fn_r | 47 | 88.3 | |
| loc_per_fn_r_exp | 46 | 77.1 | |
| loc_per_fn_r_not_exp | 50 | 90.0 | |
| rel_whitespace_R | 24 | 98.4 | TRUE |
| rel_whitespace_inst | 23 | 81.8 | |
| rel_whitespace_vignettes | 24 | 85.4 | |
| rel_whitespace_tests | 27 | 99.6 | TRUE |
| doclines_per_fn_exp | 21 | 17.0 | |
| doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
| fn_call_network_size | 160 | 84.6 |
2a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
3. goodpractice and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
| id | name | conclusion | sha | run_number | date |
|---|---|---|---|---|---|
| 18251834739 | pages build and deployment | success | d37d15 | 22 | 2025-10-05 |
| 18251738033 | pkgdown.yaml | success | 60e965 | 19 | 2025-10-05 |
| 18251738031 | R-CMD-check.yaml | success | 60e965 | 19 | 2025-10-05 |
| 18251738027 | test-coverage | success | 60e965 | 14 | 2025-10-05 |
3b. goodpractice results
R CMD check with rcmdcheck
R CMD check generated the following check_fails:
- cyclocomp
- no_description_date
Test coverage with covr
Package coverage: 74.96
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
| function | cyclocomplexity |
|---|---|
| extract_entities_workflow | 145 |
| abc_model | 132 |
| sanitize_dictionary | 100 |
| vis_heatmap | 99 |
| vis_network | 80 |
| load_from_umls | 71 |
| validate_entity_with_nlp | 57 |
| extract_entities | 54 |
| load_from_mesh | 47 |
| pubmed_search | 46 |
| parse_pubmed_xml | 45 |
| create_comat | 43 |
| create_report | 43 |
| run_lbd | 41 |
| is_valid_biomedical_entity | 40 |
| anc_model | 38 |
| load_dictionary | 37 |
| extract_ner | 35 |
| vis_abc_heatmap | 35 |
| export_chord_diagram | 33 |
| process_mesh_xml | 32 |
| validate_abc | 29 |
| abc_model_sig | 27 |
| abc_timeslice | 26 |
| map_ontology | 26 |
| shadowtext | 26 |
| abc_model_opt | 24 |
| eval_evidence | 24 |
| process_mesh_chunks | 24 |
| export_network | 23 |
| lsi_model | 23 |
| query_umls | 23 |
| apply_bitola_flexible | 22 |
| merge_entities | 21 |
| vis_abc_network | 21 |
| get_pmc_fulltext | 20 |
| validate_entity_comprehensive | 20 |
| vec_preprocess | 20 |
| bitola_model | 19 |
| create_sparse_comat | 19 |
| extract_ngrams | 19 |
| fetch_and_parse_pmc | 19 |
| find_abc_all | 19 |
| ncbi_search | 19 |
| preprocess_text | 19 |
| cluster_docs | 18 |
| create_citation_net | 17 |
| create_term_document_matrix | 17 |
| load_mesh_terms_from_pubmed | 17 |
| create_tdm | 16 |
| extract_topics | 16 |
| validate_term_by_type | 16 |
| compare_terms | 15 |
| min_results | 15 |
Static code analyses with lintr
lintr found no issues with this package!
4. Other Checks
Details of other checks (click to open)
:heavy_multiplication_x: The following 10 function names are duplicated in other packages:
-
create_reportfrom DataExplorer, prodigenr, reporter
-
extract_entitiesfrom medExtractR
-
load_dictionaryfrom ricu
-
merge_resultsfrom climwin
-
ncbi_searchfrom taxize
-
parallel_analysisfrom kim
-
plot_heatmapfrom dendroTools, dynplot, greatR, MitoHEAR, omu, Plasmidprofiler, RolWinMulCor, romic
-
plot_networkfrom cape, dbnR, HeteroGGM, immcp, imsig, LSVAR, SeqNet, SubgrPlots
-
save_resultsfrom data.validator
-
vis_heatmapfrom immunarch
Package Versions
| package | version |
|---|---|
| pkgstats | 0.2.0.68 |
| pkgcheck | 0.1.2.233 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
Thanks for the update. I think if you change the dontrun to donttest, the rOpenSci checks will pass. In the meantime, I'll clone the package and take it for a test drive 🏎️
Hi @ldecicco-USGS , is there any update on the review?
@chaoliu-cl Sorry for the delay on this. My turn as EIC started on Nov 1 but I forgot about it and the reminders got lost in the shutdown! Let me dig back into this and will let you know shortly.
@chaoliu-cl Have had some time to take a look at this and have had a chance to chat with some of the other rOpenSci editors.
LBDiscover is definitely a good fit for rOpenSci; however, we do have some concerns with the size of the package (8000+ lines of code and 100+ exported functions). Given the scope of your goals for LBDiscover it makes sense that it is big, but it may introduce some challenges in finding reviewers willing to commit to a review of that scale. Prior to passing this to a handling editor, I wanted to ask if you would consider breaking the package into two separate packages.
In your readme you list the 7 key features of the package (https://github.com/chaoliu-cl/LBDiscover#key-features). Based on these would it be possible to split it with the first three (Data Retrieval, Text Preprocessing, and Entity Extraction) into a data access/processing focused package and the final four (Co-occurrence Analysis, Discovery Models, Validation, and Visualization) into a data analysis/visualization package?
This is not necessarily a requirement for review as I know this would add additional work on the front end for you, but in the long run we believe it would make for easier review and easier long-term maintenance of the package.
Thoughts?