mindsdb
mindsdb copied to clipboard
[Bug]: Getting error on full data analysis
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
I have been running mindsdb in docker and i imported this dataset https://www.kaggle.com/datasets/eishahassan/large-scale-manufacturing-industries-production into mindsdb
I added the dataset as file import and query via mindsdb web ui
SELECT * FROM files.large_scale_industries ;
Once results are retrieved, I click on Data Insights and click on Full Data Analysis It shows Analysis in progress but never completes. It does not indicate distribution for two columns - Product & Unit of Quantity
I have replaced column names to include _ instead of space
I see following error
2022-11-27 15:38:47,921 - ERROR - http exception: index 0 is out of bounds for axis 0 with size 0 Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/opt/conda/lib/python3.7/site-packages/flask_restx/api.py", line 403, in wrapper resp = resource(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/flask/views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/flask_restx/resource.py", line 49, in dispatch_request resp = meth(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/mindsdb/api/http/namespaces/analysis.py", line 96, in post data_frame=DataFrame(data, columns=column_names) File "/opt/conda/lib/python3.7/site-packages/mindsdb/integrations/handlers/lightwood_handler/lightwood_handler/lightwood_handler.py", line 187, in analyze_dataset analysis = lightwood.analyze_dataset(data_frame) File "/opt/conda/lib/python3.7/site-packages/lightwood/api/high_level.py", line 125, in analyze_dataset problem_definition = ProblemDefinition.from_dict({'target': str(df.columns[0])}) File "/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4604, in __getitem__ return getitem(key) IndexError: index 0 is out of bounds for axis 0 with size 0 2022-11-27 15:38:47,944 - ERROR - [2022-11-27 15:38:47,943] ERROR in app: Exception on /api/analysis/data [POST] Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request()
Expected Behavior
Ideally analysis has to complete without errors and distribution is shown for all columns. It should show 7728/7728 Rows complete.
Steps To Reproduce
1. Run mindsdb as docker image `docker run -p 47334:47334 -p 47335:47335 mindsdb/mindsdb`
2. go to localhost:47334 and upload the dataset via import file
3. Give a suitable table name and execute select on the table
4. Do full data analysis
Anything else?
No response
Hi, thank you for reporting this. we will be trying to replicate it shortly.
@rparthas Did you pulled the latest docker image?
I am using mindsdb/mindsdb:22.10.2.1 which was updated 2 months ago. I guess when I started the work that was the latest. I can try pulling in the version which was updated 7 days ago
@ZoranPandovski Tried with the latest tag. Now so far no error has come. Will observe some time and close if the analysis completes
No error but still does not complete analysis for all 3 columns
Hi @rparthas, I investigated this and it works okay from what I can gather (see attached image), the analysis takes a couple of seconds to complete.
If you are still having this issue, feel free to reopen, but I'll close this for the time being.