dupi icon indicating copy to clipboard operation
dupi copied to clipboard

A tool to find all duplicates in large sets of text documents.

⊧ dupi

Dupi is an engine for identifying and exploring duplicative text in sets of documents.

Status

Dupi is in alpha/early beta development stage. Please feel free to give it a try (and file issues). We have run it on several document sets successfully, but it definitely needs more testing.

Input

Throw hundreds of thousands of textual documents at it. Or extract text from other documents and send that to dupi.

Output

Find and query for repeated chunks of text.

Tutorial

Tutorial

Design

Design Document

Library Reference

Go Reference