libact
libact copied to clipboard
ValueError: setting array element with sequence
Hello,
It's my first time using Active Learning, so probably it's a noob error, but I have a ValueError, as mentioned in the title, when calling model.predict_proba(trn_ds). It's on the validation.py file from sklearn.
File "/home/user/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 448, in check_array array = array.astype(np.float64) ValueError: setting an array element with a sequence.
Maybe I'm using it when I'm not suposed to, or sendind the wrong splitted dataset..
I get the same error with others datasets, I've also tried python2.7 and used this example:
https://libact.readthedocs.io/en/latest/examples/plot.html , I just added the line in the for cycle model.predict_real(trn_ds)
after trainning the model.
PS: I've also noticed that when I define the query strategy and when calling make_query() it trains the dataset, is it supposed to?
Thanks in advance!
Can you provide more detail on the code and the error message? It seems to me that when you declare the dataset object, you passed in something that can not be turned into numpy array.
For some query strategies, it is normal to train a model when making query.
On Tue, Nov 12, 2019 at 10:55 AM Jessica Cunha [email protected] wrote:
Hello, It's my first time using Active Learning, so probably it's a noob error, but I have a ValueError, as mentioned in the title, when calling model.predict_proba(trn_ds). It's on the validation.py file from sklearn. File "/home/user/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 448, in check_array array = array.astype(np.float64) ValueError: setting an array element with a sequence. Maybe I'm using it when I'm not suposed to, or sendind the wrong splitted dataset.. I get the same error with others datasets, I've also tried python2.7 and used this example: https://libact.readthedocs.io/en/latest/examples/plot.html , I just added the line in the for cycle model.predict_real(trn_ds) after trainning the model.
PS: I've also noticed that when I define the query strategy and when calling make_query() it trains the dataset, is it supposed to?
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ntucllab/libact/issues/171?email_source=notifications&email_token=AA77TVKF6R2CXTE76RSUYXDQTL33DA5CNFSM4JMHS4B2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HYZDDFQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA77TVN5AKE7K5J2B5O7ZK3QTL33DANCNFSM4JMHS4BQ .
The dataset is like this: 1 1:48 2:30.46 3:59 4:177.39 5:5.62 1 1:48 2:30.46 3:58 4:176.78 5:3.37 1 1:48 2:30.46 3:57 4:158.75 5:3.37 1 1:48 2:30.46 3:60 4:137.71 5:3.37
The code
def split_train_test():
X, y = import_libsvm_sparse(DATASET_FILEPATH).format_sklearn()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE)
trn_ds = Dataset(X_train, numpy.concatenate([y_train[:N_LABELED], [None] * (len(y_train) - N_LABELED)]))
tst_ds = Dataset(X_test, y_test)
fully_labeled_trn_ds = Dataset(X_train, y_train)
return trn_ds, tst_ds, y_train, fully_labeled_trn_ds
if __name__ == "__main__":
trn_ds, tst_ds, y_train, fully_labeled_trn_ds = split_train_test()
lbr = IdealLabeler(fully_labeled_trn_ds)
quota = len(y_train) - N_LABELED
qs = UncertaintySampling(trn_ds, method='lc', model=LogisticRegression())
model = LogisticRegression()
for _ in range(quota):
ask_id = qs.make_query()
X, _ = zip(*trn_ds.data)
lb = lbr.label(X[ask_id])
trn_ds.update(ask_id, lb)
model.train(trn_ds)
model.predict_real(trn_ds)
The output:
/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping, defaultdict
Traceback (most recent call last):
File "main.py", line 98, in <module>
run(trn_ds, lbr, model, qs, quota)
File "main.py", line 61, in run
model.predict_proba(trn_ds)
File "/home/jessicamegane/.local/lib/python3.7/site-packages/libact/models/logistic_regression.py", line 40, in predict_proba
return self.model.predict_proba(feature, *args, **kwargs)
File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/linear_model/logistic.py", line 1340, in predict_proba
return super(LogisticRegression, self)._predict_proba_lr(X)
File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/linear_model/base.py", line 338, in _predict_proba_lr
prob = self.decision_function(X)
File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/linear_model/base.py", line 300, in decision_function
X = check_array(X, accept_sparse='csr')
File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 448, in check_array
array = array.astype(np.float64)
ValueError: setting an array element with a sequence.
Thank you!
I think for model.predict_real(trn_ds)
, you should not pass in the Dataset object.
You should pass in an array, for example model.predict_real(trn_ds.get_entries()[0])
Yes, it worked, I didn't knew that it would predict for all querys. I thought it would give us just the prediction to the query that we got. I have another question, how do I know which query is associated to which array in the array of probabilities (Because I think it's ordered by best probability values)?
Do you mean that you want to get the predicted probability of the queried example? I think you can try this, this will return an array with one value
model.predict_real([X[ask_id]])