pyterrier icon indicating copy to clipboard operation
pyterrier copied to clipboard

Excessive logging about termpipelines global defaults

Open joelrorseth opened this issue 2 years ago • 1 comments

Hi there, I hope this is a simple oversight on my part, however I can't seem to disable the following warning:

[main] WARN org.terrier.querying.ApplyTermPipeline - The index has no termpipelines configuration, and no control configuration is found. Defaulting to global termpipelines configuration of 'Stopwords,PorterStemmer'. Set a termpipelines control to remove this warning.

I have tried the following:

# Upon init
pt.set_property('termpipelines', 'Stopwords,PorterStemmer')

# When loading my index
indexer.setProperties(**{'termpipelines' : 'Stopwords,PorterStemmer'})

# When using BatchRetrieve (the warning doesn't appear to be triggered by this call, however)
pipeline = pt.BatchRetrieve(
    self.index_ref,
    wmodel='BM25',
    properties={'termpipelines' : 'Stopwords,PorterStemmer'}
)

I am pretty sure that my use of TextScorer is triggering this warning, but I do not see any properties / args to set termpipelines. Here is my current usage:

textscorer = pt.text.scorer(
    body_attr='text',
    wmodel='BM25',
    background_index=self.index
)
textscorer.transform(test_df)

Please let me know if there is something I am missing, or if I have stumbled across an oversight in TextScorer. Thanks!

joelrorseth avatar May 24 '22 23:05 joelrorseth

Hi @joelrorseth

Thanks for the report.

The recent PyTerrier and Terrier releases changes the way we address termpipelines for DISK indices. I'm going to leave this open as I dont think we have well addressed it for pt.text.scorer() yet.

cmacdonald avatar Nov 10 '22 22:11 cmacdonald