mindsdb icon indicating copy to clipboard operation
mindsdb copied to clipboard

[Bug]: Exception "Lightgbm mixer not supported for type: tags" when following process plant tutorial

Open philfuchs opened this issue 1 year ago • 21 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behaviour

I'm following the Manufacturing process quality tutorial and during training I get the error Exception: Lightgbm mixer not supported for type: tags, raised at: /opt/conda/lib/python3.7/site-packages/mindsdb/interfaces/model/learn_process.py#177.

Expected Behaviour

I'd expect AutoML to train the model successfully

Steps To Reproduce

Follow the tutorial steps until training.

Anything else?

The tutorials seem to be of mixed quality, some resulting in errors, some in low model performance or "null" predictions (for the bus ride tutorial).

philfuchs avatar Jul 21 '22 13:07 philfuchs

Hi @philfuchs we will update the docs accordingly. Just to confirm, are you running MindsDB locally and can you you test with the below query?

CREATE PREDICTOR mindsdb.process_quality_predictor
FROM files (
    SELECT iron_feed, silica_feed, starch_flow, amina_flow, ore_pulp_flow,
           ore_pulp_ph, ore_pulp_density,flotation_column_01_air_flow,
           flotation_column_02_air_flow, flotation_column_03_air_flow,
           flotation_column_04_air_flow, flotation_column_05_air_flow,
           flotation_column_06_air_flow,flotation_column_07_air_flow,
           flotation_column_01_level, flotation_column_02_level,
           flotation_column_03_level, flotation_column_04_level,
           flotation_column_05_level, flotation_column_06_level, 
           flotation_column_07_level, iron_concentrate, silica_concentrate
    FROM process_quality LIMIT 5000
) PREDICT silica_concentrate as quality;

Lastly can you confirm that the dataset's headers has been cleaned up before creating the predictor? Please feel free to send me a message over Slack and we can have a chat about your experience with the tutorials

chandrevdw31 avatar Jul 26 '22 10:07 chandrevdw31

Hi @chandrevdw31,

yes, I'm running it locally in a Docker container according to the docs.

I just prepared (sed -e 's/ /_/g' -e 's/\(.*\)/\L\1/' -e 's/%_//g' MiningProcess_Flotation_Plant_Database.csv > fixed_headers.csv) and uploaded the dataset via the file upload to mindsdb and executed your predictor statement. It leads me to the same error: Exception: Lightgbm mixer not supported for type: tags, raised at: /opt/conda/lib/python3.7/site-packages/mindsdb/interfaces/model/learn_process.py#177

mindsdb version: 22.7.4.0

These are the original column names: date,% Iron Feed,% Silica Feed,Starch Flow,Amina Flow,Ore Pulp Flow,Ore Pulp pH,Ore Pulp Density,Flotation Column 01 Air Flow,Flotation Column 02 Air Flow,Flotation Column 03 Air Flow,Flotation Column 04 Air Flow,Flotation Column 05 Air Flow,Flotation Column 06 Air Flow,Flotation Column 07 Air Flow,Flotation Column 01 Level,Flotation Column 02 Level,Flotation Column 03 Level,Flotation Column 04 Level,Flotation Column 05 Level,Flotation Column 06 Level,Flotation Column 07 Level,% Iron Concentrate,% Silica Concentrate

These are the adapted names: date,iron_feed,silica_feed,starch_flow,amina_flow,ore_pulp_flow,ore_pulp_ph,ore_pulp_density,flotation_column_01_air_flow,flotation_column_02_air_flow,flotation_column_03_air_flow,flotation_column_04_air_flow,flotation_column_05_air_flow,flotation_column_06_air_flow,flotation_column_07_air_flow,flotation_column_01_level,flotation_column_02_level,flotation_column_03_level,flotation_column_04_level,flotation_column_05_level,flotation_column_06_level,flotation_column_07_level,iron_concentrate,silica_concentrate

They seem correct to me.

philfuchs avatar Jul 26 '22 11:07 philfuchs

Thank you so much for the feedback! The team will have a look into the error. Please do reach out on Slack so that we can have a chat. In the mean time, feel free to check out the Home Rentals tutorial :)

chandrevdw31 avatar Jul 26 '22 12:07 chandrevdw31

You're welcome. I have already tried most of the tutorials to evaluate mindsDB for my company.

philfuchs avatar Jul 26 '22 12:07 philfuchs

That's awesome! MindsDB also allows community members to contribute to their product,so if you have any suggestions on how the documentation can be improved or general notes on what you would specifically like to see being improved please let me know :+1:

chandrevdw31 avatar Jul 26 '22 12:07 chandrevdw31

I think it would be interesting to compare the accuracy of a custom model with out-of-the-box MindsDB on the same task, that way it's clearer what to expect with MindsDB only. I don't think I saw that in the tutorials. If we decide to use MindsDB I expect to be able to contribute anyway, if only with bug reports. 😀

philfuchs avatar Jul 27 '22 10:07 philfuchs

I have followed the Process Quality tutorial, including the steps to upload a file to the MindsDB Cloud editor. And it seems that there is a problem with the dataset. I have downloaded this dataset from Kaggle and run the 'sed' command to fix the headers, as it is instructed in this tutorial. I found that the problem is all the column values use a comma as a decimal point instead of a dot. After changing all decimal points to dots (instead of commas), I could upload and use the file (ProcessQuality.csv).

As for the CREATE PREDICTOR command, it is going to be updated in the docs.

martyna-mindsdb avatar Jul 28 '22 16:07 martyna-mindsdb

Hey @martyna-mindsdb the columns values should be fine with the comma as the double quotes should assist with this. Only reason why Im mentioning this is because I have previously used this dataset with an earlier version and the predictor worked fine. Just a note incase further investigation should be done.

chandrevdw31 avatar Jul 28 '22 17:07 chandrevdw31

Thank you, @chandrevdw31. It seems there might be another reason why the file is not uploaded to the MindsDB Cloud editor.

martyna-mindsdb avatar Jul 28 '22 17:07 martyna-mindsdb

Hey @martyna-mindsdb the file won't upload onto cloud due to the size limit to upload a file, however you can upload it on local mindsdb :+1:

chandrevdw31 avatar Jul 28 '22 17:07 chandrevdw31

@chandrevdw31 When the file size is too big, there is a clear error message (I have sized down this dataset to be able to upload it). In my case, I uploaded a file within the size of 10MB and got an error message.

martyna-mindsdb avatar Jul 28 '22 17:07 martyna-mindsdb

@martyna-mindsdb yeah it might be an issue with the file being compressed. Hope this can be resolved, goodluck!

chandrevdw31 avatar Jul 28 '22 18:07 chandrevdw31

@chandrevdw31 The file was not compressed. Some of the data rows were removed from the file to make it smaller in size. So the issue is somewhere else. Thanks!

martyna-mindsdb avatar Jul 30 '22 14:07 martyna-mindsdb

Ok. I am able to upload the file but still get the same error when training predictor

chandrevdw31 avatar Aug 09 '22 14:08 chandrevdw31

Hi @chandrevdw31, The file upload works fine (the issues before were caused by incorrect re-saving of the file by some editors).

About the CREATE PREDICTOR statement, I run it in the MindsDB Cloud and I could create and train the predictor successfully. Please provide the steps that you follow. Thanks.

martyna-mindsdb avatar Aug 09 '22 14:08 martyna-mindsdb

Were you able to run it in Cloud with the fewer rows in the dataset?

  1. Clean dataset
  2. Start MindsDB in Docker
  3. Upload file in GUI
  4. Create predictor in CLI

I used the CLI as this is the only tutorial that still uses CLI as an example that's why Im sticking with it.

@Zoran should we maybe consider using the dataset with the fewer rows that @martyna-mindsdb used and point to that and create a guide on cloud seeing as the predictor trained successfully?

We can write another tutorial using SQL CLI

chandrevdw31 avatar Aug 09 '22 15:08 chandrevdw31

@chandrevdw31 To answer your question, I was able to run it in Cloud. Removing some of the data rows doesn't affect the CREATE PREDICTOR statement - it may affect the accuracy of the predictions (because of having less data).

martyna-mindsdb avatar Aug 09 '22 16:08 martyna-mindsdb

Hey @martyna-mindsdb, please clarify on the below

  • Removing the rows makes the dataset size smaller- was this not what you did in order to upload the file like you have mentioned before?

If not

  • The file you referred to that works fine( the issues before were caused by incorrect re-saving of the file by some editors)- Can you share that file with me please? If this was the file used and the predictor trained successfully, we should use this as the datasource and Zoran can uload this to github so we can point to this file.

  • There is still a limit size on cloud and the original dataset is over 180MB. Currently I am unable to upload the dataset on Cloud due to the limit size, however it works with local GUI via URL 127.0.0.1:47334/.

  • The question was if we should document what you did seeing as you are the only one that was able to successfully create and train a predictor and use the exact same dataset you had success with. At the moment users will not be able to even upload the dataset on Cloud due to the size limit, hence it is being focused on that it's uploaded into local instance.

chandrevdw31 avatar Aug 09 '22 17:08 chandrevdw31

@chandrevdw31

  • Removing the rows makes the dataset size smaller- was this not what you did in order to upload the file like you have mentioned before?

Yes. The dataset was ~175MB, so to upload it to MindsDB, I had to make it <10MB.

If not

  • The file you referred to that works fine( the issues before were caused by incorrect re-saving of the file by some editors)- Can you share that file with me please? If this was the file used and the predictor trained successfully, we should use this as the datasource and Zoran can uload this to github so we can point to this file.

Here is the file I used today: fixed_headers_smallsize.csv Its upload to MindsDB was successful and I could create and train a predictor using the CREATE PREDICTOR statement that you posted above.

  • There is still a limit size on cloud and the original dataset is over 180MB. Currently I am unable to upload the dataset on Cloud due to the limit size, however it works with local GUI via URL 127.0.0.1:47334/.

  • The question was if we should document what you did seeing as you are the only one that was able to successfully create and train a predictor and use the exact same dataset you had success with. At the moment users will not be able to even upload the dataset on Cloud due to the size limit, hence it is being focused on that it's uploaded into local instance.

@ZoranPandovski Please have a look at whether there should be two separate tutorials. Thanks.

martyna-mindsdb avatar Aug 09 '22 18:08 martyna-mindsdb

@martyna-mindsdb perfect so we can use the dataset with the fewer rows. @ZoranPandovski we can upload that dataset to github

No need for a seperate tutorial at all. The error with the original dataset should still be investigated incase the same issue occurs with another dataset a user is using

chandrevdw31 avatar Aug 09 '22 18:08 chandrevdw31

@chandrevdw31 There is no problem with the original dataset from Kaggle. It couldn't be uploaded to MindsDB because it was processed in different editors. But the dataset itself is good to go (that is, after removing some of the data rows to make it smaller).

martyna-mindsdb avatar Aug 09 '22 18:08 martyna-mindsdb

Closing for now.

ZoranPandovski avatar Aug 23 '22 10:08 ZoranPandovski