tribuo icon indicating copy to clipboard operation
tribuo copied to clipboard

FS using wrapper approaches

Open Mohammed-Ryiad-Eiadeh opened this issue 2 years ago • 7 comments

greetings,

I asked this question before. I have some concerns about selecting features using approximation algorithms like: Cuckoo Search. I did that from scrach with a project I worked on in the past, yet reading data from CSV file into two D-array and saving the new subset of features into new CSV file each time for evaluating purposes (training and testing) is a time consuming and its not professional at all. So my question here, can you remind me of what classes and interfacess I need to use in order to integrate them with my work?

Mohammed-Ryiad-Eiadeh avatar Apr 04 '23 02:04 Mohammed-Ryiad-Eiadeh

I'm not sure I understand the question. At the moment Tribuo doesn't have any implementations of feature selection wrappers. To add one you need to implement org.tribuo.FeatureSelector with the desired algorithm. The SelectedFeatureSet produced by a run of the algorithm can be saved out, and you can produce a dataset containing only the selected features by constructing a SelectedFeaturesDataset.

Craigacp avatar Apr 04 '23 17:04 Craigacp

Thats all I need to know now. And for further concerns I may reopen this issue.

Mohammed-Ryiad-Eiadeh avatar Apr 04 '23 21:04 Mohammed-Ryiad-Eiadeh

Dear Adam,

I implemented a wrapper FS based Cuckoo search algorithm. But I want to know your opinion about this:

var data = new CSVLoader<Label>(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");

    var dataSplitter = new TrainTestSplitter<Label>(data, 0.5, Trainer.DEFAULT_SEED);
    var TrainingPart = new MutableDataset<Label>(dataSplitter.getTrain());
    var TestinfPart = new MutableDataset<Label>(dataSplitter.getTest());

    var opt = new CuckooSearchOptimizer(TestinfPart,
            TransferFunction.TransferFunction_V2,
            50,
            2,
            2,
            0.1,
            1.5,
            10);

    var SFS = opt.select(TrainingPart);

This is how the algorithm looks like, and my concern is about passing the test part to the constructor since I think the code should be better but the wrapper FS requires to train and test each solution from the population so I need to use train and test portions for it, now my suggestion is to pass the datasource to the FS algorithm such as:

var data = new CSVLoader<Label>(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");

    var opt = new CuckooSearchOptimizer(data,
            TransferFunction.TransferFunction_V2,
            50,
            2,
            2,
            0.1,
            1.5,
            10);

    var SFS = opt.getSelectedFeature();

With some other methods to get all needed information.

Please tell me if there is another appropriate solution for this

Mohammed-Ryiad-Eiadeh avatar Apr 20 '23 02:04 Mohammed-Ryiad-Eiadeh

I would pass the feature selection algorithm a dataset and have it split that internally, controlled by a parameter. DataSources should only be converted into Datasets, nothing should really be processing them in the DataSource form.

Craigacp avatar Apr 20 '23 13:04 Craigacp

Okay, in the algorithm I need to train some trainer like KNN (lazy algorithm) in order to evaluate each solution from the population, therefore I need the train and test parts to be used inside the algorithm and I cant do that by passing the training part, I want to know your suggesion

Mohammed-Ryiad-Eiadeh avatar Apr 20 '23 18:04 Mohammed-Ryiad-Eiadeh

You should keep the test set used by the wrapper completely separate from the test set used to evaluate the final classifier, so you need to split your data into at least three chunks, a train set for the wrapper, a test set for the wrapper and a final test set. You can also train the final classifier on the wrappers train & test set combined if you want, but that's not necessary. You can also do cross validation inside the wrapper, or randomly split the data each time for each feature set, but essentially all three of those options operate on whatever data you pass into the wrapper which should be separate from your final test set.

Craigacp avatar Apr 20 '23 19:04 Craigacp

I think 10-fold cross validation is suitable for such a task and it solved the issue I was asking about. Now I want to add some other constructors, writing some comments too. Thanks for your help. I will request to add the model to the Tribuo engine and I may add more wrapper approaches for FS in the near future. The code looks like this:

var data = new CSVLoader<Label>(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");

    var dataSplitter = new TrainTestSplitter<Label>(data, 0.5, Trainer.DEFAULT_SEED);
    var TrainingPart = new MutableDataset<Label>(dataSplitter.getTrain());
    var TestinfPart = new MutableDataset<Label>(dataSplitter.getTest());

    var opt = new CuckooSearchOptimizer(TransferFunction.TransferFunction_V2,
            50,
            2,
            2,
            0.1,
            1.5,
            20);

    var SFS = opt.select(TrainingPart);
    System.out.println(SFS.featureNames().size());
    var SFDS = new SelectedFeatureDataset<>(TrainingPart, SFS);

Mohammed-Ryiad-Eiadeh avatar Apr 20 '23 20:04 Mohammed-Ryiad-Eiadeh