Snowball icon indicating copy to clipboard operation
Snowball copied to clipboard

Implementation with some extensions of the paper "Snowball: Extracting Relations from Large Plain-Text Collections" (Agichtein and Gravano, 2000)

Snowball: Extracting Relations from Large Plain-Text Collections

This is my own implementation of the the Snowball system to bootstrap relationship instances. You can find more details about the original system here:

For more details about this particular implementation please refer to:

A sample file containing sentences where the named-entities are already tagged can be downloaded, which has 1 million sentences taken from the New York Times articles part of the English Gigaword Collection.

NOTE: look at the desription of BREDS to understand how to give a tagged document collection and seeds to setup the bootstrapping of relationship instances with Snowball, both systems have a similar setup.