kale icon indicating copy to clipboard operation
kale copied to clipboard

Object arrays cannot be loaded when allow_pickle=False

Open anneum opened this issue 3 years ago • 7 comments

I get the following error during the automatic data exchange between two containers.

2021-03-03 13:14:23 Kale marshalling          [INFO]     Loading pandas file using Pandas backend: df_test_set_without_labels
2021-03-03 13:14:23 Kale marshalling          [INFO]     Loading pandas file using Pandas backend: df_train_set
2021-03-03 13:14:23 Kale marshalling          [INFO]     Loading numpy file using Numpy backend: X
2021-03-03 13:14:23 Kale marshalling          [ERROR]    During data passing, Kale could not load the following file:

  - name: 'X'
The error was:
Object arrays cannot be loaded when allow_pickle=False

With numpy >= 1.16.3 (NumPy 1.16.3 Release Notes), allow_pickle was set to False by default and I need to use numpy > 1.17 as a dependency for another module.

I tried a workaround like:

import numpy as np
old = np.load
np.load = lambda *a,**k: old(*a,**k,allow_pickle=True)

anneum avatar Mar 03 '21 13:03 anneum

Hi @anneum did using numpy > 1.17 solve the issue?

StefanoFioravanzo avatar Apr 01 '21 06:04 StefanoFioravanzo

@StefanoFioravanzo no, all numpy versions >= 1.16.3 causes the error.

anneum avatar Apr 06 '21 14:04 anneum

@anneum I just tested with numpy 1.19.5 and pickling worked fine. Can you confirm your Numpy version with a pip3 freeze?

StefanoFioravanzo avatar Apr 09 '21 09:04 StefanoFioravanzo

@StefanoFioravanzo I rebuild my image and did a pip3 freeze as the last step of my docker build. The numpy version inside the image is numpy==1.18.1.

I get the same error.

2021-04-13 11:55:20 Kale marshalling          [INFO]     Loading numpy file using Numpy backend: X
2021-04-13 11:55:20 Kale marshalling          [ERROR]    During data passing, Kale could not load the following file

  - name: 'X'

The error was:
Object arrays cannot be loaded when allow_pickle=False

Same behavior with numpy==1.19.5.

anneum avatar Apr 13 '21 12:04 anneum

@anneum can you provide the exact code or notebook you are using to create the pipeline? I'd like to try reproduce this exactly. It would be great if you could provide the simplest notebook possible that reproduces this with a requirements.txt

StefanoFioravanzo avatar Apr 14 '21 07:04 StefanoFioravanzo

@StefanoFioravanzo: I have scaled the notebook (as .txt) down as far as I could.

Import Cell:

import pandas as pd

Pipeline Step: data_preprocessing

df_train_set = pd.read_csv("/home/jovyan/train_v1.4.tsv", sep='\t')

X = df_train_set.text.values

Pipeline Step: encoding depends on data_preprocessing

print('Original: ', X[0])

Error:

2021-04-20 12:05:35 Kale marshalling          [INFO]     Loading numpy file using Numpy backend: X
2021-04-20 12:05:35 Kale marshalling          [ERROR]    During data passing, Kale could not load the following file:

  - name: 'X'

The error was:
Object arrays cannot be loaded when allow_pickle=False

anneum avatar Apr 20 '21 12:04 anneum

@anneum could also provide the CSV file you are using? I tried to reproduce this with a very simple CSV with numbers, but everything is fine. I guess you are using some particular data format.

StefanoFioravanzo avatar May 07 '21 14:05 StefanoFioravanzo