modAL
modAL copied to clipboard
Handle error using Committee queries in ActiveLearner (vice versa)?
Just will start out by saying that this is a great package!
I'm fairly new to AL, so apologies if I've go anything wrong here, but my presumption, through looking through the code, is that:
from modAL.disagreement import vote_entropy_sampling
should be used for the Committee class.
from modAL.uncertainty import entropy_sampling
should be used for the ActiveLearner class.
However, if you do something like:
ActiveLearner(svm, X_training=X[init_samp], y_training=y[init_samp], query_strategy=vote_entropy_sampling)
You don't have an error raised until you try run .query() on your learner. At which point the TypeError reads:
TypeError: object of type 'ActiveLearner' has no len()
Would it be better here to either refer ActiveLearner and Committee to a Enum class of valid query strategies they can use, or make something that can fire a warning when the wrong one is used and make it default to the correct one (i.e. If you use vote_entropy_sampling in ActiveLearner like above, it will warn that there is only a single model used and change to entropy_sampling)?
Again, would like to say that this package has helped me visualise a lot of various techniques for AL, so thanks for making it!
Hi! I am glad that you have found modAL useful! :)
You raise a valid point. During the development, this was a deliberate design decision I have made. To elaborate, one of the main design principles behind modAL is the easy extensibility and modularity (hence the name :) ). I wanted users to be able to quickly write query strategies and plug them directly into the ActiveLearner. So, manually setting which query strategy could be used was not the way to go. At this point, I decided to delegate the responsibility to the query strategy function itself instead of the ActiveLearner or Committee classes, because they shouldn't assume any knowledge about the query strategy function.
Implementing a feature to check if a query strategy is suitable would be nice, but I was unable to find an elegant solution to this without sacrificing the modularity. However, this would be a huge improvement.
I'll spend some time giving this problem another try, but I don't promise that it will be ready soon. These days, I am fully engaged with my new project (telesto.ai, a competitive crowdsourcing platform for machine learning, check it out :) ), so I have very little time to anything else. I'll keep you posted here!