esalib
esalib copied to clipboard
My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task.
My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task.
== WARNING == The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage.
== Changelog == 7.12.2013
- fixed a few mistakes in the tutorial
- merged pull request fixing a problem on MacOS
15.2.2013
- found out about problem with stemming - the example english background is stemmed by PorterStemmer, but my library uses SnowballStemmer; this results in a lot of OOV words and therefore low similarity scores
- added interactive mode to the analyzer - now you can pipe-in pairs of texts to compare (1 line = 1 text) and ESAAnalyzer produces the similarity scores
- added wikixray scripts that were missing from the tutorial
7.10.2012
- fixed a typo in analyzer bash script, causing only the first words to be analyzed; fixed handling of oov words; removed length filter (only words 3-100 chars long were considered)
29.9.2012
- added support for SQLite, so that the library is better usable for fast prototyping
25.3.2012
- initial release
== Files ==
-
/example - see example data in /example where you can find an ESA background built from Wikipedia snapshot from 2005, and directly use it in our tools for assessing semantic similarity of English textis/words.
-
/tutorial - basic instructions for building your own background
-
/lib - Java libraries required to run
== So how to get ESA running in 2 minutes for English? == 0. # git co https://github.com/ticcky/esalib.git # cd esalib
-
Create a symbolic link to the sample database # ln -s example/esa_en.db esa_db.db
-
Get relatedness estimate of two texts: # ./run_analyzer "computer" "apple"
Please don't hessitate to get in touch if you want to use my library but have troubles with it.