fxt

fxt copied to clipboard

Reame
Issues

Sequential dependence model

Open lgrz opened this issue 5 years ago • 0 comments

Implement the sequential dependence (SD) part of A Markov Random Field Model for Term Dependencies

The implementation is based on the one within Indri.

Statistics counting. Relevant parts for finding phrase statistics ordered and unordered.

src/ContextCountAccumulator.cpp
src/OrderedWindowNode.cpp
src/UnorderedWindowNode.cpp
indri::api::QueryEnvironment::_sumServerQuery [collect statistics]

Scoring. Segments useful for understanding how scoring is performed.

indri::infnet::WeightedAndNode::score
indri::infnet::ListBeliefNode::score
indri::query::DirichletTermScoreFunction
indri::infnet::InferenceNetwork::_evaluateDocument
indri::api::QueryEnvironment::_scoredQuery [get the final scores]

Progress. Tracking list of items needed/complete.

[x] score basic 1 term query
[x] score basic 2 term query
[x] score basic 3 term query
[x] score phrases that don't exist
[x] test ordered window boundaries
[x] test unordered window boundaries
[x] ordered: multiple matches of phrase within the same window
[x] unordered: multiple matches of phrase within the same window
[x] use postings to scan forward index for phrase collection count
[ ] cache of sub-query statistics when scoring many documents
[ ] sub-query cache over different queries
[x] documentation
[x] enable for extraction in extract_features

Apr 20 '20 07:04 lgrz