modAL icon indicating copy to clipboard operation
modAL copied to clipboard

How can I initial an ActiveLearner with 3-D shape data?

Open rz-zhang opened this issue 4 years ago • 5 comments

from modAL.models import ActiveLearner

learner = ActiveLearner(
    estimator=classifier,
    X_training=X_initial, y_training=y_initial,
    verbose=1
)

Here I am trying to do the Named-Entity Recognition task, so the shape of y_initial is (1000, 75, 17), where 1000 is the number of sentences, 75 is the number of words in one sentence, and 17 is the number of tags for a word. Then I get such an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-63-72e48629dd01> in <module>()
      5     estimator=classifier,
      6     X_training=X_initial, y_training=y_initial,
----> 7     verbose=1
      8 )

3 frames
/usr/local/lib/python3.6/dist-packages/modAL/models/learners.py in __init__(self, estimator, query_strategy, X_training, y_training, bootstrap_init, **fit_kwargs)
     77                  ) -> None:
     78         super().__init__(estimator, query_strategy,
---> 79                          X_training, y_training, bootstrap_init, **fit_kwargs)
     80 
     81     def teach(self, X: modALinput, y: modALinput, bootstrap: bool = False, only_new: bool = False, **fit_kwargs) -> None:

/usr/local/lib/python3.6/dist-packages/modAL/models/base.py in __init__(self, estimator, query_strategy, X_training, y_training, bootstrap_init, force_all_finite, **fit_kwargs)
     61         self.y_training = y_training
     62         if X_training is not None:
---> 63             self._fit_to_known(bootstrap=bootstrap_init, **fit_kwargs)
     64 
     65         assert isinstance(force_all_finite, bool), 'force_all_finite must be a bool'

/usr/local/lib/python3.6/dist-packages/modAL/models/base.py in _fit_to_known(self, bootstrap, **fit_kwargs)
    104         """
    105         if not bootstrap:
--> 106             self.estimator.fit(self.X_training, self.y_training, **fit_kwargs)
    107         else:
    108             n_instances = self.X_training.shape[0]

/usr/local/lib/python3.6/dist-packages/keras/wrappers/scikit_learn.py in fit(self, x, y, sample_weight, **kwargs)
    204             y = np.searchsorted(self.classes_, y)
    205         else:
--> 206             raise ValueError('Invalid shape for y: ' + str(y.shape))
    207         self.n_classes_ = len(self.classes_)
    208         if sample_weight is not None:

ValueError: Invalid shape for y: (1000, 75, 17)

rz-zhang avatar Feb 28 '20 04:02 rz-zhang

You get the error during the initialization? From the error details, it seems like you have the error after you first try to call learner.fit() or learner.teach(), because it refers to y, which has an invalid shape, it should be (1000, 50, 15) like y_initial, instead of (1000, 75, 17) which it actually has.

cosmic-cortex avatar Feb 28 '20 06:02 cosmic-cortex

You get the error during the initialization? From the error details, it seems like you have the error after you first try to call learner.fit() or learner.teach(), because it refers to y, which has an invalid shape, it should be (1000, 50, 15) like y_initial, instead of (1000, 75, 17) which it actually has.

Sorry for the confusion. Actually, I keep the consistency in my code. Both the shapes are (1000, 75, 17). I have edited the description of my first comment. To clarify, the problem is whether the ActiveLearner class only supports 2-D data as input?

rz-zhang avatar Feb 28 '20 06:02 rz-zhang

Thanks for the clarification! Which Keras version you are running? (You can find this by for example pip freeze | grep keras.) It seems to me that the problem is actually with Keras, you may be using an older version where only 2D data is supported.

modAL supports arbitrary dimensions, so it is either a bug or an issue with your Keras version.

cosmic-cortex avatar Feb 28 '20 08:02 cosmic-cortex

Thanks for the clarification! Which Keras version you are running? (You can find this by for example pip freeze | grep keras.) It seems to me that the problem is actually with Keras, you may be using an older version where only 2D data is supported.

modAL supports arbitrary dimensions, so it is either a bug or an issue with your Keras version.

The versions I'm using are: tf.version = 1.15.0 keras.version = 2.2.4

According to the traceback, I checked the source code of python3.6/dist-packages/keras/wrappers/scikit_learn.py and I found that there is a restrict limit to the data shape as you can see below. So which keras version do you recommend?

class KerasClassifier(BaseWrapper):
"""Implementation of the scikit-learn classifier API for Keras.
    """
    def fit(self, x, y, sample_weight=None, **kwargs):
        """Constructs a new model with `build_fn` & fit the model to `(x, y)`.

        # Arguments
            x : array-like, shape `(n_samples, n_features)`
                Training samples where `n_samples` is the number of samples
                and `n_features` is the number of features.
            y : array-like, shape `(n_samples,)` or `(n_samples, n_outputs)`
                True labels for `x`.
            **kwargs: dictionary arguments
                Legal arguments are the arguments of `Sequential.fit`

        # Returns
            history : object
                details about the training history at each epoch.

        # Raises
            ValueError: In case of invalid shape for `y` argument.
        """
        y = np.array(y)
        if len(y.shape) == 2 and y.shape[1] > 1:
            self.classes_ = np.arange(y.shape[1])
        elif (len(y.shape) == 2 and y.shape[1] == 1) or len(y.shape) == 1:
            self.classes_ = np.unique(y)
            y = np.searchsorted(self.classes_, y)
        else:
            raise ValueError('Invalid shape for y: ' + str(y.shape))
        self.n_classes_ = len(self.classes_)
        if sample_weight is not None:
            kwargs['sample_weight'] = sample_weight
        return super(KerasClassifier, self).fit(x, y, **kwargs)

rz-zhang avatar Feb 28 '20 09:02 rz-zhang

Sorry for the late answer, I was extremely busy. I was checking out an earlier version of Keras which didn't contain this restriction, so I would suggest the most recent one where these changes are not introduced. It seems to me that 2.0.1 would work: https://github.com/keras-team/keras/blob/2.0.1/keras/wrappers/scikit_learn.py

However, this is a quite old version, so it might not be best to use this. What I would recommend in that case is the following. First, you should try to reproduce this issue without using modAL to make sure it is an issue with Keras. If you can confirm this, the next step should be to open an issue in the Keras repository detailing the problem.

cosmic-cortex avatar Mar 09 '20 06:03 cosmic-cortex