openml-python
openml-python copied to clipboard
get_dataset(), "The kernel appears to have died. It will restart automatically"
Description
Steps/Code to Reproduce
Expected Results
Actual Results
Hi, I'll move this to the openml-python issue tracker
I'm guessing you tried to download a large dataset? This is a known issue. The ARFF parser uses too much memory.
We have implemented parquet support, but this is not yet in the current release.
We have implemented parquet support, but this is not yet in the current release.
Small correction, it should be available in the current release as soon as the production server sends valid information on where the parquet file is located.
Thank you very much for your answer, do you know approximately how long is the time for this new version?
On Mon, Jun 14, 2021 at 1:39 PM PGijsbers @.***> wrote:
We have implemented parquet support, but this is not yet in the current release.
Small correction, it should be available in the new release as soon as the production server sends valid information on where the parquet file is located.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1093#issuecomment-860617099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MRNQUD7BB5INTXKJGLTSXTABANCNFSM46VBXS7Q .
@learsi1911 Could you please provide the ID of the dataset you were trying to download? And could you share how much memory was available to the kernel? That information would allow us to test whether the issue is resolved when the parquet support is fully operational.
@prabhant Do you have an estimate on when the parquet files are available from the production server?
@learsi1911 Could you please provide the ID of the dataset you were trying to download? And could you share how much memory was available to the kernel? That information would allow us to test whether the issue is resolved when the parquet support is fully operational.
@prabhant Do you have an estimate on when the parquet files are available from the production server?
Of course the ID is 547 As I said the problem is that the first time I used "get_dataset()" I have no problem but if I try again then I get the error.
The production server with parquet support will be ready in a week or two.
Dataset 547 is not really large and shouldn't result in any issues. Could you please run the failing snippet from within ipython and paste the output?
Yes, I have tried python directly in the windows console and it works, maybe it is something related to jupyter.
On Tue, Jun 15, 2021 at 8:41 AM Matthias Feurer @.***> wrote:
Dataset 547 is not really large and shouldn't result in any issues. Could you please run the failing snippet from within ipython and paste the output?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1093#issuecomment-861221532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MS7O5EPETWPQNRSZQLTS3YYXANCNFSM46VBXS7Q .
The jupyter notebook kernels typically work with much less memory than a regular python process. But as mfeurer said, the dataset isn't large and should not lead to a kernel dying. It would be helpful if you could post the code that lead to the error and the full error output.
If the problem still occurs, please re-open this issue but provide a code example that reproduces the error.