fxt icon indicating copy to clipboard operation
fxt copied to clipboard

Sequential dependence model

Open lgrz opened this issue 5 years ago • 0 comments

Implement the sequential dependence (SD) part of A Markov Random Field Model for Term Dependencies

The implementation is based on the one within Indri.

Statistics counting. Relevant parts for finding phrase statistics ordered and unordered.

  • src/ContextCountAccumulator.cpp
  • src/OrderedWindowNode.cpp
  • src/UnorderedWindowNode.cpp
  • indri::api::QueryEnvironment::_sumServerQuery [collect statistics]

Scoring. Segments useful for understanding how scoring is performed.

  • indri::infnet::WeightedAndNode::score
  • indri::infnet::ListBeliefNode::score
  • indri::query::DirichletTermScoreFunction
  • indri::infnet::InferenceNetwork::_evaluateDocument
  • indri::api::QueryEnvironment::_scoredQuery [get the final scores]

Progress. Tracking list of items needed/complete.

  • [x] score basic 1 term query
  • [x] score basic 2 term query
  • [x] score basic 3 term query
  • [x] score phrases that don't exist
  • [x] test ordered window boundaries
  • [x] test unordered window boundaries
  • [x] ordered: multiple matches of phrase within the same window
  • [x] unordered: multiple matches of phrase within the same window
  • [x] use postings to scan forward index for phrase collection count
  • [ ] cache of sub-query statistics when scoring many documents
  • [ ] sub-query cache over different queries
  • [x] documentation
  • [x] enable for extraction in extract_features

lgrz avatar Apr 20 '20 07:04 lgrz