textreuse
textreuse copied to clipboard
list of matches for pairs returned by pairwise_candidates()
I'm doing this:
comparisons <- pairwise_compare(corpus, ratio_of_matches, directional = T)
colnames(comparisons) <-rownames(comparisons) <-
paste0(rownames(comparisons), '@', wordcount(corpus))
pw <- pairwise_candidates(comparisons)
pw$wordcount_a <- as.integer(sub('.*@', '', pw$a))
pw$wordcount_b <- as.integer(sub('.*@', '', pw$b))
pw$score_abs_a <- pw$score * pw$wordcount_a
pw$score_abs_b <- pw$score * pw$wordcount_b
I'm adding the wordcount to the pw data.frame, and would need, for a given pair (a row of pw), to find the matched n-grams.
See also https://github.com/ropensci/textreuse/issues/99
My use case as a teacher: among my students who submitted their works (assignment)s), I want to find those who borrowed part of their work from fellow students. Say students A and B have 2 paragraphs in common. Assignment A has 5 paragraphs; assignments B has 20 paragraphs.
I need to double-check visually (by reading the texts) what the matches are; to see if the match is legitimate (not a fraud) or not.