software-review Submitting LBDiscover package

Submitting Author Name: Chao Liu Submitting Author Github Handle: @chaoliu-cl Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/chaoliu-cl/LBDiscover Version submitted: Submission type: Standard Editor: TBD Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: LBDiscover
Title: Literature-Based Discovery Tools for Biomedical Research
Version: 0.1.0
Date: 2025-05-14
Authors@R: 
    person("Chao Liu", email = "[email protected]", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-9979-8272"))
Description: A suite of tools for literature-based discovery in biomedical research. 
    Provides functions for retrieving scientific articles from PubMed and 
    other NCBI databases, extracting biomedical entities (diseases, drugs, genes, etc.), 
    building co-occurrence networks, and applying various discovery models 
    including ABC, AnC, LSI, and BITOLA. The package also includes 
    visualization tools for exploring discovered connections.
License: GPL-3
URL: https://github.com/chaoliu-cl/LBDiscover, http://liu-chao.site/LBDiscover/, https://liu-chao.site/LBDiscover/
BugReports: https://github.com/chaoliu-cl/LBDiscover/issues
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Depends: 
    R (>= 4.0.0)
Imports: 
    httr (>= 1.4.0),
    xml2 (>= 1.3.0),
    igraph (>= 1.2.0),
    Matrix (>= 1.3.0),
    utils,
    stats,
    grDevices,
    graphics,
    tools,
    rentrez (>= 1.2.0),
    jsonlite (>= 1.7.0)
Suggests:
    openxlsx (>= 4.2.0),
    SnowballC (>= 0.7.0),
    visNetwork (>= 2.1.0),
    spacyr (>= 1.2.0),
    parallel,
    digest (>= 0.6.0),
    irlba (>= 2.3.0),
    knitr,
    rmarkdown,
    base64enc,
    reticulate,
    testthat (>= 3.0.0),
    mockery,
    covr,
    htmltools
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [X] data retrieval
- [X] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [X] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] translation
Explain how and why the package falls under these categories (briefly, 1-2 sentences): Data retrieval: The package provides functions for retrieving scientific articles from PubMed and other NCBI databases. It is a tool for systematically accessing biomedical literature from major research repositories. Data extraction: It extracts biomedical entities (diseases, drugs, genes, etc.) from retrieved literature, performing information extraction from scientific texts. Citation management and bibliometrics: The package builds co-occurrence networks from literature and applies discovery models (ABC, AnC, LSI, BITOLA) to find hidden connections between concepts, which represents bibliometric analysis for literature-based discovery research.
Who is the target audience and what are scientific applications of this package? Target Audience: LBDiscover is designed for biomedical researchers, bioinformaticians, and data scientists working in literature-based discovery (LBD). The primary users include:
Biomedical researchers seeking hidden connections between diseases, drugs, and genes
Pharmaceutical researchers exploring drug repurposing opportunities
Bioinformaticians building knowledge networks from literature
Graduate students and academics studying computational approaches to hypothesis generation

Scientific Applications: The package supports several key research applications:

Drug Discovery and Repurposing: LBD has been used extensively in drug development and repurposing as well as predicting adverse drug reactions
Disease-Gene Association Discovery: Using literature-based discovery to identify disease candidate genes
Biomarker Identification: LBD has been explored as a tool to identify biomarkers for diagnostic and prognostic for diseases
Hypothesis Generation: Creating testable scientific hypotheses by connecting disparate pieces of literature
Knowledge Network Construction: Building co-occurrence networks to visualize research landscapes

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? There are several R packages that overlap with LBDiscover's functionality, but none provide the same comprehensive approach to literature-based discovery: Similar Packages and Key Differences:

pubmed.mineR Overlap: PubMed text mining with functions for data visualization and biomedical entity extraction Difference: Focuses on general text mining and clustering rather than implementing specific LBD models like ABC, AnC, LSI, and BITOLA
bibliometrix Overlap: Comprehensive science mapping analysis with network analysis capabilities and bibliometric workflows Difference: Designed for general scientometric analysis across all disciplines, not specifically for biomedical literature-based discovery or implementing LBD-specific algorithms
Data Retrieval Packages (rentrez, easyPubMed, RISmed) Overlap: All provide interfaces to NCBI/PubMed for retrieving biomedical literature Difference: These focus solely on data retrieval and don't perform LBD analysis, entity extraction, or hypothesis generation

How LBDiscover Meets Best-in-Category Criteria:

Unique Functionality: LBDiscover is the first R package to specifically implement established LBD models:

ABC Model: The most basic and widespread type of LBD centered around finding connections between concepts A, B, and C
BITOLA: An interactive literature-based biomedical discovery support system using semantic prediction
LSI (Latent Semantic Indexing): A statistical technique for improving information retrieval effectiveness used to assist in literature-based discoveries
AnC Model: Advanced connection models for more sophisticated discovery patterns

Integrated Workflow: Unlike other packages that handle only one aspect (retrieval OR analysis OR visualization), LBDiscover provides a complete workflow from data retrieval through entity extraction to discovery model application and network visualization.
Biomedical Specialization: While bibliometrix serves general scientometrics and pubmed.mineR does general text mining, LBDiscover is specifically designed for biomedical literature-based discovery with domain-specific entity recognition (diseases, drugs, genes).
Modern Implementation: Recent work has focused on integrating Large Language Models for enhancing Literature-Based Discovery processes, and LBDiscover appears positioned to incorporate such advances while maintaining established methodological foundations.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? NA
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass. None

Technical checks

Confirm each of the following by checking the box.

[X] I have read the rOpenSci packaging guide.
[X] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions, created with roxygen2.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage.

Publication options

[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

[ ] The package is novel and will be of interest to the broad readership of the journal.
[ ] The manuscript describing the package is no longer than 3000 words.
[ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

[X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Sep 24 '25 03:09 chaoliu-cl

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

Sep 24 '25 03:09 ropensci-review-bot

:rocket:

Editor check started

:wave:

Sep 24 '25 03:09 ropensci-review-bot

Checks for LBDiscover (v0.1.0)

git hash: 02f4c075

:heavy_check_mark: Package is already on CRAN.
:heavy_multiplication_x: does not have a 'codemeta.json' file.
:heavy_multiplication_x: does not have a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_multiplication_x: These functions do not have examples: [abc_model, anc_model, clear_pubmed_cache, create_report, eval_evidence, extract_entities_workflow, extract_entities, find_term, gen_report, get_dict_cache, get_term_vars, is_valid_biomedical_entity, load_dictionary, lsi_model, merge_entities, min_results, plot_heatmap, plot_network, prep_articles, query_external_api, query_mesh, query_umls, safe_diversify, sanitize_dictionary, valid_entities, validate_biomedical_entity, validate_entity_comprehensive, validate_entity_with_nlp].
:heavy_check_mark: Package has continuous integration checks.
:heavy_multiplication_x: Package coverage is 22.4% (should be at least 75%).
:heavy_multiplication_x: All examples use \dontrun{}.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.
:eyes: Some goodpractice linters failed.
:eyes: Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL-3

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type	package	ncalls
internal	base	2214
internal	LBDiscover	147
internal	methods	9
imports	stats	61
imports	graphics	58
imports	xml2	54
imports	utils	51
imports	httr	33
imports	igraph	19
imports	rentrez	9
imports	Matrix	8
imports	tools	3
imports	grDevices	2
imports	jsonlite	2
suggests	visNetwork	8
suggests	parallel	7
suggests	irlba	2
suggests	reticulate	2
suggests	SnowballC	1
suggests	spacyr	1
suggests	digest	1
suggests	openxlsx	NA
suggests	knitr	NA
suggests	rmarkdown	NA
suggests	base64enc	NA
suggests	testthat	NA
suggests	mockery	NA
suggests	covr	NA
suggests	htmltools	NA
linking_to	NA	NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

c (195), character (171), for (126), length (121), data.frame (115), nrow (112), sapply (111), min (83), list (73), max (72), grepl (55), any (54), unique (53), numeric (50), names (39), which (34), paste0 (31), sum (30), if (29), integer (25), rep (25), return (23), seq_along (22), unlist (22), attr (21), ncol (20), paste (19), tryCatch (19), rbind (18), tolower (18), is.na (17), is.null (16), matrix (16), ceiling (15), lapply (15), rownames (15), strsplit (15), table (15), colnames (14), nchar (13), as.numeric (12), seq_len (11), vector (11), match (9), order (9), round (9), regexpr (8), regmatches (8), setdiff (8), as.character (7), drop (7), gregexpr (7), ifelse (7), rowSums (7), sqrt (7), url (7), body (6), plot (6), range (6), sort (6), dim (5), gsub (5), substr (5), t (5), diff (4), grep (4), seq (4), sprintf (4), switch (4), col (3), diag (3), emptyenv (3), environment (3), file (3), log (3), logical (3), new.env (3), row (3), tapply (3), tempfile (3), all (2), apply (2), by (2), colSums (2), dimnames (2), do.call (2), mean (2), outer (2), row.names (2), sub (2), Sys.time (2), try (2), abs (1), as.data.frame (1), cat (1), colMeans (1), difftime (1), duplicated (1), expression (1), file.path (1), floor (1), format (1), interactive (1), match.arg (1), merge (1), mode (1), packageEvent (1), setHook (1), suppressMessages (1), system.file (1), units (1), unname (1), version (1), which.max (1)

LBDiscover

retry_api_call (16), create_comat (4), load_dictionary (4), pubmed_search (4), string_similarity (4), throttle_api (4), abc_model (3), authenticate_umls (3), cluster_docs (3), count_corpus_terms (3), extract_entities (3), get_pubmed_cache (3), tokenize_text (3), vec_preprocess (3), calc_doc_sim (2), calculate_score (2), create_cache_key (2), create_dummy_dictionary (2), create_term_document_matrix (2), diversify_abc (2), extract_text_ngrams (2), get_color_palette (2), get_dict_cache (2), get_service_ticket (2), is_valid_biomedical_entity (2), load_dict_single (2), load_from_mesh (2), load_from_umls (2), load_mesh_terms_from_pubmed (2), process_mesh_xml (2), abc_model_opt (1), abc_model_sig (1), abc_timeslice (1), add_statistical_significance (1), alternative_validation (1), anc_model (1), apply_bitola_flexible (1), apply_correction (1), b_term_type_filter (1), bitola_model (1), calc_bibliometrics (1), clear_pubmed_cache (1), compare_terms (1), create_citation_net (1), create_report (1), create_single_heatmap (1), create_sparse_comat (1), create_tdm (1), create_vis_heatmap (1), detect_lang (1), diversify_b_terms (1), diversify_c_paths (1), enhance_abc_kb (1), eval_evidence (1), export_chord (1), export_chord_diagram (1), export_network (1), extract_entities_workflow (1), extract_mesh_from_text (1), extract_ner (1), extract_ngrams (1), extract_terms (1), extract_topics (1), fetch_and_parse_gene (1), fetch_and_parse_pmc (1), fetch_and_parse_protein (1), fetch_and_parse_pubmed (1), filter_by_type (1), filter_terms_for_abc_model (1), find_abc_all (1), find_similar_docs (1), find_term (1), gen_report (1), get_pmc_fulltext (1), get_term_vars (1), get_type_dist (1), get_umls_semantic_types (1), is_valid_type (1), list_to_df (1), load_results (1), parse_pubmed_xml (1), preprocess_text (1), process_batch (1), split_into_sentences (1), split_text (1)

stats

df (19), terms (16), p.adjust (5), phyper (4), kmeans (3), profile (3), aggregate (2), runif (2), setNames (2), smooth (2), complete.cases (1), dist (1), pt (1)

graphics

text (29), par (13), title (8), layout (6), arrows (2)

xml2

xml_find_first (19), xml_text (19), xml_find_all (10), read_xml (4), xml_attr (1), xml_name (1)

utils

txtProgressBar (40), read.csv (4), adist (2), write.csv (2), de (1), head (1), URLencode (1)

httr

content (18), GET (8), POST (5), headers (2)

igraph

graph_from_data_frame (12), layout_with_fr (6), degree (1)

methods

new (9)

rentrez

entrez_link (3), entrez_search (3), entrez_fetch (2), entrez_summary (1)

Matrix

t (4), diag (2), sparseMatrix (2)

visNetwork

visEdges (2), visGroups (2), visNetwork (2), visLayout (1), visSave (1)

parallel

clusterExport (3), parLapply (2), detectCores (1), makeCluster (1)

tools

file_ext (3)

grDevices

colorRampPalette (1), rainbow (1)

irlba

irlba (2)

jsonlite

fromJSON (2)

reticulate

import (2)

digest

digest (1)

SnowballC

wordStem (1)

spacyr

spacy_parse (1)

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in R (100% in 13 files) and
1 authors
3 vignettes
no internal data file
11 imported packages
105 exported functions (median 47 lines of code)
146 non-exported functions in R (median 48 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	13	65.8
files_inst	4	97.1
files_vignettes	3	89.3
files_tests	9	84.9
loc_R	8759	97.6	TRUE
loc_inst	991	77.5
loc_vignettes	925	88.8
loc_tests	1294	86.6
num_vignettes	3	91.0
n_fns_r	251	91.3
n_fns_r_exported	105	95.3	TRUE
n_fns_r_not_exported	146	88.0
n_fns_per_file_r	9	87.6
num_params_per_fn	4	51.1
loc_per_fn_r	48	88.7
loc_per_fn_r_exp	47	77.7
loc_per_fn_r_not_exp	48	89.6
rel_whitespace_R	24	98.5	TRUE
rel_whitespace_inst	23	81.5
rel_whitespace_vignettes	25	85.1
rel_whitespace_tests	31	91.6
doclines_per_fn_exp	20	15.3
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	157	84.3

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

id	name	conclusion	sha	run_number	date
17965843714	pages build and deployment	success	04c683	7	2025-09-24
17965695551	pkgdown.yaml	success	a008bc	4	2025-09-24
17965695558	R-CMD-check.yaml	success	a008bc	4	2025-09-24

3b. `goodpractice` results

`R CMD check` with rcmdcheck

R CMD check generated the following check_fails:

cyclocomp
no_description_date

Test coverage with covr

Package coverage: 22.42

The following files are not completely covered by tests:

file	coverage
R/abc_model.R	22.9%
R/comprehensive_summary.R	0%
R/heatmap_visualization.R	6.86%
R/performance_optimalization.R	19.75%
R/pubmed_search.R	0%
R/queries.R	15.5%
R/text_preprocessing.R	0%
R/utils.R	1.93%
R/visualization.R	49.63%
R/zzz.R	10%

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function	cyclocomplexity
is_valid_biomedical_entity	161
extract_entities_workflow	145
abc_model	129
sanitize_dictionary	100
vis_heatmap	99
vis_network	77
load_from_umls	71
validate_entity_with_nlp	57
extract_entities	54
pubmed_search	46
load_from_mesh	45
parse_pubmed_xml	45
create_comat	43
create_report	43
run_lbd	41
anc_model	38
load_dictionary	37
extract_ner	35
vis_abc_heatmap	35
export_chord_diagram	33
process_mesh_xml	32
validate_abc	29
abc_model_sig	27
lsi_model	27
abc_timeslice	26
map_ontology	26
shadowtext	26
abc_model_opt	24
eval_evidence	24
process_mesh_chunks	24
export_network	23
query_umls	23
apply_bitola_flexible	22
merge_entities	21
vis_abc_network	21
get_pmc_fulltext	20
validate_entity_comprehensive	20
vec_preprocess	20
bitola_model	19
create_sparse_comat	19
fetch_and_parse_pmc	19
find_abc_all	19
ncbi_search	19
cluster_docs	18
create_citation_net	17
load_mesh_terms_from_pubmed	17
create_tdm	16
create_term_document_matrix	16
extract_topics	16
preprocess_text	16
compare_terms	15
min_results	15

Static code analyses with lintr

lintr found no issues with this package!

4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 10 function names are duplicated in other packages:

- create_report from DataExplorer, prodigenr, reporter
- extract_entities from medExtractR
- load_dictionary from ricu
- merge_results from climwin
- ncbi_search from taxize
- parallel_analysis from kim
- plot_heatmap from dendroTools, dynplot, greatR, MitoHEAR, omu, Plasmidprofiler, RolWinMulCor, romic
- plot_network from cape, dbnR, HeteroGGM, immcp, imsig, LSVAR, SeqNet, SubgrPlots
- save_results from data.validator
- vis_heatmap from immunarch

Package Versions

package	version
pkgstats	0.2.0.66
pkgcheck	0.1.2.230

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

Sep 24 '25 04:09 ropensci-review-bot

Thanks for the submission @chaoliu-cl ! The package sounds really neat. Let me know when you've been able to address the✖️ found in the check.

I'd be a little concerned about those high complexity values found in the goodpractices checks. Those files and functions are huge and look like there are logical places you could split up the code. One example might be to put all of the static lists in a sysdata.rda file.

Sep 25 '25 20:09 ldecicco-USGS

Hi @ldecicco-USGS ,

Thank you for the feedback. I have addressed the highlighted issues including the following:

Code Complexity & sysdata.rda Following your suggestion, I've extracted all static lists into a sysdata.rda file (acronym corrections, term mappings, common words, entity patterns, etc.). This significantly reduced function complexity by removing hundreds of lines of static definitions.
codemeta.json & CONTRIBUTING.md Both files have been added.
Function Examples Added runnable examples (without \dontrun{}) for all previously undocumented functions.
Test Coverage Improved from 22.4% to 75%.

Oct 05 '25 01:10 chaoliu-cl

@ropensci-review-bot check package

Oct 10 '25 01:10 ldecicco-USGS

Thanks, about to send the query.

Oct 10 '25 01:10 ropensci-review-bot

:rocket:

Editor check started

:wave:

Oct 10 '25 01:10 ropensci-review-bot

Checks for LBDiscover (v0.1.0)

git hash: 60e965ad

:heavy_check_mark: Package is already on CRAN.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_multiplication_x: These functions do not have examples: [anc_model, create_report, lsi_model, query_external_api, query_mesh, query_umls, validate_biomedical_entity, validate_entity_comprehensive, validate_entity_with_nlp].
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 75%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.
:eyes: Some goodpractice linters failed.
:eyes: Function names are duplicated in other packages
:eyes: Examples should not use \dontrun{} unless really necessary.

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL-3

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type	package	ncalls
internal	base	2234
internal	LBDiscover	149
internal	methods	9
internal	usethis	2
imports	stats	61
imports	graphics	58
imports	xml2	54
imports	utils	51
imports	httr	33
imports	igraph	19
imports	rentrez	9
imports	Matrix	8
imports	tools	3
imports	grDevices	2
imports	jsonlite	2
suggests	visNetwork	8
suggests	parallel	7
suggests	irlba	2
suggests	reticulate	2
suggests	SnowballC	1
suggests	spacyr	1
suggests	digest	1
suggests	openxlsx	NA
suggests	knitr	NA
suggests	rmarkdown	NA
suggests	base64enc	NA
suggests	testthat	NA
suggests	mockery	NA
suggests	covr	NA
suggests	withr	NA
suggests	htmltools	NA
linking_to	NA	NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

character (195), c (192), data.frame (146), for (125), length (124), nrow (112), list (82), sapply (77), min (76), max (68), rep (64), numeric (54), unique (52), paste0 (48), names (39), which (34), return (33), if (30), sum (30), integer (25), grepl (24), seq_along (22), unlist (22), attr (21), any (20), paste (19), tolower (19), tryCatch (19), ncol (18), rbind (18), is.na (17), is.null (16), matrix (16), ceiling (15), lapply (15), rownames (15), strsplit (15), table (15), colnames (14), nchar (13), as.numeric (12), vector (11), match (9), order (9), round (9), seq_len (9), regexpr (8), regmatches (8), setdiff (8), as.character (7), drop (7), gregexpr (7), ifelse (7), rowSums (7), sqrt (7), url (7), body (6), plot (6), range (6), sort (6), t (6), substr (5), diff (4), dim (4), grep (4), gsub (4), seq (4), sprintf (4), switch (4), col (3), diag (3), emptyenv (3), environment (3), file (3), log (3), logical (3), new.env (3), row (3), tapply (3), tempfile (3), all (2), apply (2), by (2), cat (2), colSums (2), dimnames (2), do.call (2), expression (2), mean (2), outer (2), row.names (2), sub (2), Sys.time (2), try (2), abs (1), as.data.frame (1), colMeans (1), difftime (1), duplicated (1), file.path (1), floor (1), format (1), interactive (1), match.arg (1), merge (1), mode (1), rank (1), suppressMessages (1), system.file (1), units (1), unname (1), version (1), which.max (1)

LBDiscover

retry_api_call (16), create_comat (4), load_dictionary (4), pubmed_search (4), string_similarity (4), throttle_api (4), abc_model (3), authenticate_umls (3), cluster_docs (3), count_corpus_terms (3), extract_entities (3), get_pubmed_cache (3), tokenize_text (3), vec_preprocess (3), calc_doc_sim (2), calculate_score (2), create_cache_key (2), create_dummy_dictionary (2), create_term_document_matrix (2), diversify_abc (2), extract_text_ngrams (2), get_color_palette (2), get_dict_cache (2), get_service_ticket (2), is_valid_biomedical_entity (2), load_dict_single (2), load_from_mesh (2), load_from_umls (2), load_mesh_terms_from_pubmed (2), process_mesh_xml (2), abc_model_opt (1), abc_model_sig (1), abc_timeslice (1), add_statistical_significance (1), alternative_validation (1), anc_model (1), apply_bitola_flexible (1), apply_correction (1), b_term_type_filter (1), bitola_model (1), calc_bibliometrics (1), clear_pubmed_cache (1), compare_terms (1), create_citation_net (1), create_report (1), create_single_heatmap (1), create_sparse_comat (1), create_tdm (1), create_vis_heatmap (1), detect_lang (1), diversify_b_terms (1), diversify_c_paths (1), enhance_abc_kb (1), eval_evidence (1), export_chord (1), export_chord_diagram (1), export_network (1), extract_entities_workflow (1), extract_mesh_from_text (1), extract_ner (1), extract_ngrams (1), extract_terms (1), extract_topics (1), fetch_and_parse_gene (1), fetch_and_parse_pmc (1), fetch_and_parse_protein (1), fetch_and_parse_pubmed (1), filter_by_type (1), filter_terms_for_abc_model (1), find_abc_all (1), find_similar_docs (1), find_term (1), gen_report (1), get_pmc_fulltext (1), get_term_vars (1), get_type_dist (1), get_umls_semantic_types (1), has_general_biomedical_characteristics (1), is_valid_type (1), list_to_df (1), load_results (1), lsi_model (1), parse_pubmed_xml (1), preprocess_text (1), process_batch (1), split_into_sentences (1), split_text (1)

stats

df (19), terms (16), p.adjust (5), phyper (4), kmeans (3), profile (3), aggregate (2), runif (2), setNames (2), smooth (2), complete.cases (1), dist (1), pt (1)

graphics

text (29), par (13), title (8), layout (6), arrows (2)

xml2

xml_find_first (19), xml_text (19), xml_find_all (10), read_xml (4), xml_attr (1), xml_name (1)

utils

txtProgressBar (40), read.csv (4), adist (2), write.csv (2), de (1), head (1), URLencode (1)

httr

content (18), GET (8), POST (5), headers (2)

igraph

graph_from_data_frame (12), layout_with_fr (6), degree (1)

methods

new (9)

rentrez

entrez_link (3), entrez_search (3), entrez_fetch (2), entrez_summary (1)

Matrix

t (4), diag (2), sparseMatrix (2)

visNetwork

visEdges (2), visGroups (2), visNetwork (2), visLayout (1), visSave (1)

parallel

clusterExport (3), parLapply (2), detectCores (1), makeCluster (1)

tools

file_ext (3)

grDevices

colorRampPalette (1), rainbow (1)

irlba

irlba (2)

jsonlite

fromJSON (2)

reticulate

import (2)

usethis

use_data (2)

digest

digest (1)

SnowballC

wordStem (1)

spacyr

spacy_parse (1)

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in R (100% in 14 files) and
1 authors
3 vignettes
no internal data file
11 imported packages
107 exported functions (median 46 lines of code)
150 non-exported functions in R (median 49 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	14	68.5
files_inst	4	97.0
files_vignettes	3	89.3
files_tests	28	96.9
loc_R	8267	97.3	TRUE
loc_inst	991	77.7
loc_vignettes	946	89.1
loc_tests	9585	99.1	TRUE
num_vignettes	3	91.0
n_fns_r	257	91.5
n_fns_r_exported	107	95.4	TRUE
n_fns_r_not_exported	150	88.4
n_fns_per_file_r	9	86.5
num_params_per_fn	4	51.2
loc_per_fn_r	47	88.3
loc_per_fn_r_exp	46	77.1
loc_per_fn_r_not_exp	50	90.0
rel_whitespace_R	24	98.4	TRUE
rel_whitespace_inst	23	81.8
rel_whitespace_vignettes	24	85.4
rel_whitespace_tests	27	99.6	TRUE
doclines_per_fn_exp	21	17.0
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	160	84.6

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

id	name	conclusion	sha	run_number	date
18251834739	pages build and deployment	success	d37d15	22	2025-10-05
18251738033	pkgdown.yaml	success	60e965	19	2025-10-05
18251738031	R-CMD-check.yaml	success	60e965	19	2025-10-05
18251738027	test-coverage	success	60e965	14	2025-10-05

3b. `goodpractice` results

`R CMD check` with rcmdcheck

R CMD check generated the following check_fails:

cyclocomp
no_description_date

Test coverage with covr

Package coverage: 74.96

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function	cyclocomplexity
extract_entities_workflow	145
abc_model	132
sanitize_dictionary	100
vis_heatmap	99
vis_network	80
load_from_umls	71
validate_entity_with_nlp	57
extract_entities	54
load_from_mesh	47
pubmed_search	46
parse_pubmed_xml	45
create_comat	43
create_report	43
run_lbd	41
is_valid_biomedical_entity	40
anc_model	38
load_dictionary	37
extract_ner	35
vis_abc_heatmap	35
export_chord_diagram	33
process_mesh_xml	32
validate_abc	29
abc_model_sig	27
abc_timeslice	26
map_ontology	26
shadowtext	26
abc_model_opt	24
eval_evidence	24
process_mesh_chunks	24
export_network	23
lsi_model	23
query_umls	23
apply_bitola_flexible	22
merge_entities	21
vis_abc_network	21
get_pmc_fulltext	20
validate_entity_comprehensive	20
vec_preprocess	20
bitola_model	19
create_sparse_comat	19
extract_ngrams	19
fetch_and_parse_pmc	19
find_abc_all	19
ncbi_search	19
preprocess_text	19
cluster_docs	18
create_citation_net	17
create_term_document_matrix	17
load_mesh_terms_from_pubmed	17
create_tdm	16
extract_topics	16
validate_term_by_type	16
compare_terms	15
min_results	15

Static code analyses with lintr

lintr found no issues with this package!

4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 10 function names are duplicated in other packages:

- create_report from DataExplorer, prodigenr, reporter
- extract_entities from medExtractR
- load_dictionary from ricu
- merge_results from climwin
- ncbi_search from taxize
- parallel_analysis from kim
- plot_heatmap from dendroTools, dynplot, greatR, MitoHEAR, omu, Plasmidprofiler, RolWinMulCor, romic
- plot_network from cape, dbnR, HeteroGGM, immcp, imsig, LSVAR, SeqNet, SubgrPlots
- save_results from data.validator
- vis_heatmap from immunarch

Package Versions

package	version
pkgstats	0.2.0.68
pkgcheck	0.1.2.233

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

Oct 10 '25 02:10 ropensci-review-bot

Thanks for the update. I think if you change the dontrun to donttest, the rOpenSci checks will pass. In the meantime, I'll clone the package and take it for a test drive 🏎️

Oct 17 '25 18:10 ldecicco-USGS

Hi @ldecicco-USGS , is there any update on the review?

Nov 10 '25 22:11 chaoliu-cl

@chaoliu-cl Sorry for the delay on this. My turn as EIC started on Nov 1 but I forgot about it and the reminders got lost in the shutdown! Let me dig back into this and will let you know shortly.

Nov 14 '25 19:11 jhollist

@chaoliu-cl Have had some time to take a look at this and have had a chance to chat with some of the other rOpenSci editors.

LBDiscover is definitely a good fit for rOpenSci; however, we do have some concerns with the size of the package (8000+ lines of code and 100+ exported functions). Given the scope of your goals for LBDiscover it makes sense that it is big, but it may introduce some challenges in finding reviewers willing to commit to a review of that scale. Prior to passing this to a handling editor, I wanted to ask if you would consider breaking the package into two separate packages.

In your readme you list the 7 key features of the package (https://github.com/chaoliu-cl/LBDiscover#key-features). Based on these would it be possible to split it with the first three (Data Retrieval, Text Preprocessing, and Entity Extraction) into a data access/processing focused package and the final four (Co-occurrence Analysis, Discovery Models, Validation, and Visualization) into a data analysis/visualization package?

This is not necessarily a requirement for review as I know this would add additional work on the front end for you, but in the long run we believe it would make for easier review and easier long-term maintenance of the package.

Thoughts?

Dec 02 '25 21:12 jhollist

Submitting LBDiscover package

Scope

Technical checks

Publication options

Code of conduct

Checks for LBDiscover (v0.1.0)

1. Package Dependencies

2. Statistical Properties

2a. Network visualisation

3. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

4. Other Checks

Editor-in-Chief Instructions:

Checks for LBDiscover (v0.1.0)

1. Package Dependencies

2. Statistical Properties

2a. Network visualisation

3. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

4. Other Checks

Editor-in-Chief Instructions:

3. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck

3. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck