Luke Gallagher issues

Results 23 issues of


                                            Luke Gallagher

CIFF Support

Add support for the [Common Index File Format](https://github.com/osirrc/ciff) (CIFF). This will likely depend on having more flexible options for field indexing described in #11

Create feature map file

When features are extracted, a file should be created that maps feature id's to feature names. Currently not all features are extracted in unison, so a first step would be...

A basic program to perform first stage retrieval. Efficiency is a non-goal. First stage efficiency can be found in software like [PISA](https://github.com/pisa-engine/pisa). The win here would be to use any...

Sequential dependence model

Implement the sequential dependence (SD) part of [A Markov Random Field Model for Term Dependencies](https://ciir.cs.umass.edu/pubfiles/ir-387.pdf) The implementation is based on the one within Indri. **Statistics counting.** Relevant parts for finding...

Initial document-independent feature extraction support

For tracking a number of issues related to pre-retrieval document independent features. * [ ] `preret_csv` assumes unigram and bigram files are text * [ ] `preret_csv` assumes user must...

cikm

Repeated calls to Document::decompress

It is not clear how much extra CPU is used on repeated calls to document decompress. See the following inner loop in `extract_features`: https://github.com/rmit-ir/tesserae/blob/c565cda55765e8491cb184439d8fbb296aba5d4a/src/extract_features.cpp#L580 The `Document` class should take a...

Add BM25TP to the feature set

Originally from [Büttcher et al](https://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf). Modifications exist to make it usable for dynamic pruning scenarios: * [Schenkel et al, SPIRE 07](http://infolab.stanford.edu/~theobald/pub/proximity-spire07.pdf) * [Broschart et al, TOIS 12](https://dl.acm.org/doi/abs/10.1145/2094072.2094077) Other variations: *...

Luke Gallagher

CIFF Support

Create feature map file

Simple first stage retrieval

Sequential dependence model

Initial document-independent feature extraction support

Repeated calls to Document::decompress

Add BM25TP to the feature set

Internal version number for index

Improve indexing experience

Allow fields to be selected at index time