FEDOT
FEDOT copied to clipboard
[Bug]: ValueError: [...] the array at index 0 has size 894365 and the array at index 1 has size 1117957
Expected Behavior
Auto preprocessing should work correctly. Pipeline should be fitted.
Current Behavior
FEDOT fails to fit catboostreg model with use_auto_preprocessing=True option.
PS C:\Users\nnikitin-user\Desktop\automl_may> & C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/python.exe c:/Users/nnikitin-user/Desktop/automl_may/flood_1.py
2024-05-16 13:16:58,812 - ApiDataProcessor - Preprocessing data
2024-05-16 13:16:58,812 - ApiDataProcessor - Train Data (Original) Memory Usage: 452.05 MB Data Shapes: ((1117957, 53), (1117957, 1))
2024-05-16 13:22:54,236 - ApiDataProcessor - Train Data (Processed) Memory Usage: 1.05 GB Data Shape: ((1117957, 126), (1117957, 1))
2024-05-16 13:22:54,236 - ApiDataProcessor - Data preprocessing runtime = 0:05:55.423210
2024-05-16 13:22:55,149 - AssumptionsHandler - Initial pipeline fitting started
2024-05-16 13:23:21,260 - PipelineNode - Trying to fit pipeline node with operation: catboostreg
2024-05-16 13:23:22,181 - AssumptionsHandler - Initial pipeline fit was failed due to: all the input array dimensions except for the concatenation axis must match exactly, but along dimension
0, the array at index 0 has size 894365 and the array at index 1 has size 1117957.
Traceback (most recent call last):
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 71, in fit_assumption_and_check_correctness
pipeline.fit(data_train, n_jobs=eval_n_jobs)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 197, in fit
train_predicted = self._fit(input_data=copied_input_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 112, in _fit
train_predicted = self.root_node.fit(input_data=input_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 200, in fit
self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 87, in fit
self.fitted_operation = self._eval_strategy.fit(train_data=data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 33, in fit
operation_implementation.fit(train_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 28, in fit
input_data = input_data.get_not_encoded_data()
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\data\data.py", line 628, in get_not_encoded_data
new_features = np.hstack((num_features, cat_features))
File "<__array_function__ internals>", line 200, in hstack
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\shape_base.py", line 370, in hstack
return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957
Traceback (most recent call last):
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 71, in fit_assumption_and_check_correctness
pipeline.fit(data_train, n_jobs=eval_n_jobs)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 197, in fit
train_predicted = self._fit(input_data=copied_input_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 112, in _fit
train_predicted = self.root_node.fit(input_data=input_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 200, in fit
self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 87, in fit
self.fitted_operation = self._eval_strategy.fit(train_data=data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 33, in fit
operation_implementation.fit(train_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 28, in fit
input_data = input_data.get_not_encoded_data()
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\data\data.py", line 628, in get_not_encoded_data
new_features = np.hstack((num_features, cat_features))
File "<__array_function__ internals>", line 200, in hstack
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\shape_base.py", line 370, in hstack
return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\nnikitin-user\Desktop\automl_may\flood_1.py", line 85, in <module>
auto_model.fit(features=train, target="FloodProbability")
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py", line 181, in fit
self.current_pipeline, self.best_models, self.history = self.api_composer.obtain_model(self.train_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_composer.py", line 63, in obtain_model
initial_assumption, fitted_assumption = self.propose_and_fit_initial_assumption(train_data)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_composer.py", line 107, in propose_and_fit_initial_assumption assumption_handler.fit_assumption_and_check_correctness(deepcopy(initial_assumption[0]),
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 86, in fit_assumption_and_check_correctness
self._raise_evaluating_exception(ex)
File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 94, in _raise_evaluating_exception raise ValueError(advice_info)
ValueError: Initial pipeline fit was failed due to: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957. Check pipeline structure and the correctness of the data
PS C:\Users\nnikitin-user\Desktop\automl_may>
Possible Solution
Some features are deleted during the auto preprocessing.
Perhaps it is related to categorical features.
Debug the following breakpoints to find and fix the problem.
Steps to Reproduce
- Download code and data from https://www.kaggle.com/code/eliyahusanti/fedot-nss-lab-automl-catboost-0-8676
- Set FEDOT parameter
use_auto_preprocessing=True - Run the code
Context [OPTIONAL]
Participating in a Kaggle competition.