openml-python get_dataset(), "The kernel appears to have died. It will restart automatically"

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Jun 14 '21 11:06 learsi1911

Hi, I'll move this to the openml-python issue tracker

I'm guessing you tried to download a large dataset? This is a known issue. The ARFF parser uses too much memory.

We have implemented parquet support, but this is not yet in the current release.

Jun 14 '21 11:06 joaquinvanschoren

We have implemented parquet support, but this is not yet in the current release.

Small correction, it should be available in the current release as soon as the production server sends valid information on where the parquet file is located.

Jun 14 '21 11:06 PGijsbers

Thank you very much for your answer, do you know approximately how long is the time for this new version?

On Mon, Jun 14, 2021 at 1:39 PM PGijsbers @.***> wrote:

We have implemented parquet support, but this is not yet in the current release.

Small correction, it should be available in the new release as soon as the production server sends valid information on where the parquet file is located.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1093#issuecomment-860617099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MRNQUD7BB5INTXKJGLTSXTABANCNFSM46VBXS7Q .

Jun 14 '21 11:06 learsi1911

@learsi1911 Could you please provide the ID of the dataset you were trying to download? And could you share how much memory was available to the kernel? That information would allow us to test whether the issue is resolved when the parquet support is fully operational.

@prabhant Do you have an estimate on when the parquet files are available from the production server?

Jun 14 '21 11:06 PGijsbers

@learsi1911 Could you please provide the ID of the dataset you were trying to download? And could you share how much memory was available to the kernel? That information would allow us to test whether the issue is resolved when the parquet support is fully operational.

@prabhant Do you have an estimate on when the parquet files are available from the production server?

Of course the ID is 547 As I said the problem is that the first time I used "get_dataset()" I have no problem but if I try again then I get the error.

Jun 14 '21 11:06 learsi1911

The production server with parquet support will be ready in a week or two.

Jun 14 '21 11:06 prabhant

Dataset 547 is not really large and shouldn't result in any issues. Could you please run the failing snippet from within ipython and paste the output?

Jun 15 '21 06:06 mfeurer

Yes, I have tried python directly in the windows console and it works, maybe it is something related to jupyter.

On Tue, Jun 15, 2021 at 8:41 AM Matthias Feurer @.***> wrote:

Dataset 547 is not really large and shouldn't result in any issues. Could you please run the failing snippet from within ipython and paste the output?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1093#issuecomment-861221532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MS7O5EPETWPQNRSZQLTS3YYXANCNFSM46VBXS7Q .

Jun 15 '21 09:06 learsi1911

The jupyter notebook kernels typically work with much less memory than a regular python process. But as mfeurer said, the dataset isn't large and should not lead to a kernel dying. It would be helpful if you could post the code that lead to the error and the full error output.

Jun 21 '21 19:06 PGijsbers

If the problem still occurs, please re-open this issue but provide a code example that reproduces the error.

Nov 29 '22 09:11 PGijsbers

openml-python openml-python copied to clipboard

get_dataset(), "The kernel appears to have died. It will restart automatically"

Description

Steps/Code to Reproduce

Expected Results

Actual Results

openml-python
openml-python copied to clipboard