fxt
fxt copied to clipboard
Sequential dependence model
Implement the sequential dependence (SD) part of A Markov Random Field Model for Term Dependencies
The implementation is based on the one within Indri.
Statistics counting. Relevant parts for finding phrase statistics ordered and unordered.
- src/ContextCountAccumulator.cpp
- src/OrderedWindowNode.cpp
- src/UnorderedWindowNode.cpp
- indri::api::QueryEnvironment::_sumServerQuery [collect statistics]
Scoring. Segments useful for understanding how scoring is performed.
- indri::infnet::WeightedAndNode::score
- indri::infnet::ListBeliefNode::score
- indri::query::DirichletTermScoreFunction
- indri::infnet::InferenceNetwork::_evaluateDocument
- indri::api::QueryEnvironment::_scoredQuery [get the final scores]
Progress. Tracking list of items needed/complete.
- [x] score basic 1 term query
- [x] score basic 2 term query
- [x] score basic 3 term query
- [x] score phrases that don't exist
- [x] test ordered window boundaries
- [x] test unordered window boundaries
- [x] ordered: multiple matches of phrase within the same window
- [x] unordered: multiple matches of phrase within the same window
- [x] use postings to scan forward index for phrase collection count
- [ ] cache of sub-query statistics when scoring many documents
- [ ] sub-query cache over different queries
- [x] documentation
- [x] enable for extraction in
extract_features