FEDOT
FEDOT copied to clipboard
Boosting method implementation (LightGBM)
План
- [ ] Реализовать convert_to_dataset()
- [ ] Обновить default_operation_params.json
- [ ] Обновить search_space.py
Как работает
Реализован интерфейс fit/predict в родительском классе FedotLightGBMtImplementation
Код
class FedotLightGBMImplementation(ModelImplementation):
__operation_params = ['n_jobs', 'use_eval_set']
def __init__(self, params: Optional[OperationParameters] = None):
super().__init__(params)
self.model_params = {k: v for k, v in self.params.to_dict().items() if k not in self.__operation_params}
self.model = None
def fit(self, input_data: InputData):
input_data = input_data.get_not_encoded_data()
if self.params.get('use_eval_set'):
train_input, eval_input = train_test_data_setup(input_data)
train_input = self.convert_to_dataframe(train_input)
eval_input = self.convert_to_dataframe(eval_input)
train_x, train_y = train_input.drop(columns=['target']), train_input['target']
eval_x, eval_y = eval_input.drop(columns=['target']), eval_input['target']
if self.classes_ is None:
eval_metric = 'rmse'
elif len(self.classes_) < 3:
eval_metric = 'auc'
else:
eval_metric = 'multi_logloss'
self.model.fit(X=train_x, y=train_y,
eval_set=[(eval_x, eval_y)], eval_metric=eval_metric)
else:
train_data = self.convert_to_dataframe(input_data)
train_x, train_y = train_data.drop(columns=['target']), train_data['target']
self.model.fit(X=train_x, y=train_y)
return self.model
def predict(self, input_data: InputData):
input_data = self.convert_to_dataframe(input_data.get_not_encoded_data())
train_x = input_data.drop(columns=['target'])
prediction = self.model.predict(train_x)
return prediction
Интерфейс fit/predict не поддерживает работу с внутренним типом данных lightgbm.Dataset
, поэтому необходимо было найти обходной путь. В данном случае был использован тип данных pandas.DataFrame
.
Внутри интерфейса идёт преобразование InputData
в pandas.DataFrame
(categorical_idx
становятся category
, а numerical_idx
становятся float
)
Код
@staticmethod
def convert_to_dataframe(data: Optional[InputData]):
dataframe = pd.DataFrame(data=data.features, columns=data.features_names)
dataframe['target'] = data.target
if data.categorical_idx is not None:
for col in dataframe.columns[data.categorical_idx]:
dataframe[col] = dataframe[col].astype('category')
if data.numerical_idx is not None:
for col in dataframe.columns[data.numerical_idx]:
dataframe[col] = dataframe[col].astype('float')
return dataframe
👋 Hi, I'm @docu-mentor, an LLM-powered GitHub app powered by Anyscale Endpoints that gives you actionable feedback on your writing.
Simply create a new comment in this PR that says:
@docu-mentor run
and I will start my analysis. I only look at what you changed in this PR. If you only want me to look at specific files or folders, you can specify them like this:
@docu-mentor run doc/ README.md
In this example, I'll have a look at all files contained in the "doc/" folder and the file "README.md". All good? Let's get started!
All PEP8 errors has been fixed, thanks :heart:
Comment last updated at
@open-code-helper run
@open-code-helper run
Codecov Report
Attention: Patch coverage is 77.88462%
with 23 lines
in your changes missing coverage. Please review.
Project coverage is 80.16%. Comparing base (
80eba8e
) to head (fd3786d
). Report is 1 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
...mplementations/models/boostings_implementations.py | 77.88% | 23 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #1264 +/- ##
==========================================
+ Coverage 80.10% 80.16% +0.05%
==========================================
Files 146 146
Lines 10190 10284 +94
==========================================
+ Hits 8163 8244 +81
- Misses 2027 2040 +13
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.