sempre
sempre copied to clipboard
possible improvement of accuracy in sempre1.0
Hi, Liang I think there is a bug in sempre1.0. you may try to fix it and see a significant accuracy improvement.
Location: src/edu/stanford/nlp/sempre/paraphrase/VectorSpaceModel.java in function computeSimilarity, line 133.
where you compute the similarity of two sentence by dot_product their sentences's vectors. but the sentence's vector is simply the mean of all word's vector. (see function computeUtteranceVec() ) your algorithm: sentence's vector = mean(sum(word's vector))
I think you forgot to normalize the sentence vector which should meet the condition: || sentence's vector|| === 1
sentence's similarity = sqrt {sentence1 * sentence2 / ||sentence1* sentence2 || } where ||vector of sentence1 || ==1 ||vector of sentence1 || ==1
public void computeSimilarity(ParaphraseExample ex, Params params) {
ex.ensureAnnotated();
//get source and target representations
double[] sourceVec,targetVec;
synchronized (phraseVectorCache) {
sourceVec = phraseVectorCache.containsKey(ex.source) ? phraseVectorCache.get(ex.source) : computeUtteranceVec(ex.sourceInfo);
targetVec = phraseVectorCache.containsKey(ex.target) ? phraseVectorCache.get(ex.target) : computeUtteranceVec(ex.targetInfo);
MapUtils.putIfAbsent(phraseVectorCache, ex.source, sourceVec);
MapUtils.putIfAbsent(phraseVectorCache, ex.target, targetVec);
}
//combine them
FeatureVector fv;
if(vsmSimilarityFunc==SimilarityFunc.DIAGNONAL)
fv = getDiagonalMatrixFeatures(sourceVec,targetVec);
else if(vsmSimilarityFunc==SimilarityFunc.FULL_MATRIX)
fv = getFullMatrixFeatures(sourceVec,targetVec);
else //dot product
fv = getDotProductFeature(sourceVec,targetVec); /// not a good similarity here!!!
//set stuff
ex.setVectorSpaceSimilarity(new FeatureSimilarity(fv,ex.source,ex.target,params));
}
Do you know where can I download the parasempre file?