dupree
dupree copied to clipboard
Function for obtaining / printing the text for a pair of duplicated blocks
For example,
print_dup(dup_df[1, ])
Or, if we change dupree
to return a list of class Dups
, wherein each entry is of class Dup
; then
print(dups[[1]]) might be better syntax
Note that the LCS algorithm in {stringdist} only computes the length of the LCS, it doesn't return the longest common subsequence. I can't find a good LCS implementation within CRAN (and don't want to depend on bioconductor packages since dupree is on CRAN now)
? include an LCS implementation with dupree (can still use {stringdist} for computing the distances, but local LCS for computing the duplicated strings)
[Could call to {textreuse} with the original code strings, rather than integer vectors] - but would require r-textreuse to be pushed to conda-forge for me to use this locally
Just print the contents of the two (+) blocks for now. Can implement finding the actual LCS at a later stage
Hi all,
I made a little function to view diff between each couple of code string. I hope this can help somebody. The diffr
package is needed.
dup_diff <- function(dupree_res, min_score = 0.45, nlines = 10) {
dup_misc_filter <- dupree_res$dups_df |>
filter(score > min_score)
res <- list()
for (i in seq_len(nrow(dup_misc_filter))) {
dir.create(paste0(tempdir(), "/", i), showWarnings = FALSE)
writeLines(readLines(dup_misc_filter$file_a[i])[dup_misc_filter$line_a[i] + c(0:nlines)],
paste0(tempdir(), "/", i, "/file_a"))
writeLines(readLines(dup_misc_filter$file_b[i])[dup_misc_filter$line_b[i] + c(0:nlines)],
paste0(tempdir(), "/", i, "/file_b"))
res[[i]] <- diffr::diffr(paste0(tempdir(), "/", i, "/file_a"),
paste0(tempdir(), "/", i, "/file_b"))
}
return(res)
}
For example,
example_file <- system.file("extdata", "duplicated.R", package = "dupree")
dup <- dupree(example_file, min_block_size = 10)
dup
dif <- dup_diff(dup)
Neat. Thanks