tarexp icon indicating copy to clipboard operation
tarexp copied to clipboard

Example not working?

Open bkj opened this issue 1 year ago • 2 comments

This looks like a really great package. I'm trying to get the example running, but having some trouble. I adapted the code from here: https://www.eugene.zone/tarexp/example/ yielding

import ir_measures
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import LogisticRegression

import tarexp as tar
from tarexp import component as tarc

rcv1     = datasets.fetch_rcv1()
X        = rcv1['data']
y        = rcv1['target'].todense().astype(bool)
y_names  = rcv1['target_names']
rel_info = pd.DataFrame(y, columns=y_names)


# --
# tarexp

ds      = tar.SparseVectorDataset.from_sparse(X)
setting = tarc.combine(
    tarc.SklearnRanker(LogisticRegression, solver='liblinear'),
    tarc.PerfectLabeler(),
    tarc.RelevanceSampler(),
    tarc.FixedRoundStoppingRule(max_round=20)
)()

workflow = tar.OnePhaseTARWorkflow(
    ds.setLabels(rel_info['GPRO'].values),
    setting,
    seed_doc=[1023],
    batch_size=200,
    random_seed=123
)


recording_metrics = [
    ir_measures.RPrec
]
for ledger in workflow:
    print("Round {}: found {} positives in total".format(ledger.n_rounds, ledger.n_pos_annotated))
    print("metric:", workflow.getMetrics(recording_metrics))

(Note I had to change set_label to setLabels, I think.)

When I run that, I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bjohnson/opt/anaconda3/lib/python3.9/site-packages/tarexp/workflow.py", line 102, in __next__
    self.step()
  File "/Users/bjohnson/opt/anaconda3/lib/python3.9/site-packages/tarexp/workflow.py", line 250, in step
    self.component.trainRanker(*self.dataset.getTrainingData(self.ledger))
  File "/Users/bjohnson/opt/anaconda3/lib/python3.9/site-packages/tarexp/component/ranker.py", line 49, in trainRanker
    assert np.unique(y).size == 2
AssertionError

... which I think is an error from trying to train a classifier on a (sample of the) dataset that only has a single class. Thoughts on how to proceed? Do you have another example that works out-of-the-box?

bkj avatar Dec 12 '23 20:12 bkj