textreuse icon indicating copy to clipboard operation
textreuse copied to clipboard

list of matches for pairs returned by pairwise_candidates()

Open mayeulk opened this issue 10 months ago • 0 comments

I'm doing this:

comparisons <- pairwise_compare(corpus, ratio_of_matches, directional = T)
colnames(comparisons) <-rownames(comparisons) <-
  paste0(rownames(comparisons), '@', wordcount(corpus))

pw <- pairwise_candidates(comparisons)
pw$wordcount_a <- as.integer(sub('.*@', '', pw$a))
pw$wordcount_b <- as.integer(sub('.*@', '', pw$b))
pw$score_abs_a <- pw$score * pw$wordcount_a
pw$score_abs_b <- pw$score * pw$wordcount_b

I'm adding the wordcount to the pw data.frame, and would need, for a given pair (a row of pw), to find the matched n-grams.

See also https://github.com/ropensci/textreuse/issues/99

My use case as a teacher: among my students who submitted their works (assignment)s), I want to find those who borrowed part of their work from fellow students. Say students A and B have 2 paragraphs in common. Assignment A has 5 paragraphs; assignments B has 20 paragraphs.

I need to double-check visually (by reading the texts) what the matches are; to see if the match is legitimate (not a fraud) or not.

mayeulk avatar Feb 21 '25 14:02 mayeulk