Luke Gallagher
Luke Gallagher
Move the following features into the static document feature set since they can all be precomputed and don't depend on the query (note that url slash count and url length...
Add BM25F as a feature from [Microsoft Cambridge at TREC–13: Web and HARD tracks](https://www.microsoft.com/en-us/research/uploads/prod/2004/01/Microsoft-Cambridge-at-TREC-2004-Web-and-HARD-track.pdf) Main points to consider (this issue should be expanded upon closer to implementation): * Requires field...
It may be useful in future to detect the stemmer used (if any) by Indri when building the index. This could then facilitate automatic stemming of queries when using `extract_features`...
Replace `calculate_lm` with the QL version of `DirichletTermScore`.
The default constructor does not initialize the OOV term, however the constructor `Lexicon::Lexicon(Counts c)` does initialize the OOV term. From a user perspective this can cause subtle errors in the...
From [Efficient and Effective Higher Order Proximity Modeling](https://people.eng.unimelb.edu.au/ammoffat/abstracts/lmc16ictir.pdf). The intervals for L2p, Lkp and Lkfp methods are computed using lazy/eager plane sweep [Lu et al, (CIKM 15)](https://people.eng.unimelb.edu.au/ammoffat/abstracts/lmc15cikm.pdf). **Computing intervals.** There...
The feature extraction program `extract_features` is bound to calling on Indri's metadata. This could be handled by a (docid, docno) map file that is created by Tesserae at index time....
The following features appear in the documentation, but are missing from the default config file and are disabled by default. ``` 107 Frequency of query terms within the document title...
Consider implementing floe version of features from Craswell et al.
Some features take parameters, for example BM25 has k1, b, k3. These parameters should be configurable by the user. The user should be able to configure the same feature multiple...