tarexp
tarexp copied to clipboard
Example not working?
This looks like a really great package. I'm trying to get the example running, but having some trouble. I adapted the code from here: https://www.eugene.zone/tarexp/example/ yielding
import ir_measures
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
import tarexp as tar
from tarexp import component as tarc
rcv1 = datasets.fetch_rcv1()
X = rcv1['data']
y = rcv1['target'].todense().astype(bool)
y_names = rcv1['target_names']
rel_info = pd.DataFrame(y, columns=y_names)
# --
# tarexp
ds = tar.SparseVectorDataset.from_sparse(X)
setting = tarc.combine(
tarc.SklearnRanker(LogisticRegression, solver='liblinear'),
tarc.PerfectLabeler(),
tarc.RelevanceSampler(),
tarc.FixedRoundStoppingRule(max_round=20)
)()
workflow = tar.OnePhaseTARWorkflow(
ds.setLabels(rel_info['GPRO'].values),
setting,
seed_doc=[1023],
batch_size=200,
random_seed=123
)
recording_metrics = [
ir_measures.RPrec
]
for ledger in workflow:
print("Round {}: found {} positives in total".format(ledger.n_rounds, ledger.n_pos_annotated))
print("metric:", workflow.getMetrics(recording_metrics))
(Note I had to change set_label
to setLabels
, I think.)
When I run that, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/bjohnson/opt/anaconda3/lib/python3.9/site-packages/tarexp/workflow.py", line 102, in __next__
self.step()
File "/Users/bjohnson/opt/anaconda3/lib/python3.9/site-packages/tarexp/workflow.py", line 250, in step
self.component.trainRanker(*self.dataset.getTrainingData(self.ledger))
File "/Users/bjohnson/opt/anaconda3/lib/python3.9/site-packages/tarexp/component/ranker.py", line 49, in trainRanker
assert np.unique(y).size == 2
AssertionError
... which I think is an error from trying to train a classifier on a (sample of the) dataset that only has a single class. Thoughts on how to proceed? Do you have another example that works out-of-the-box?