kale
kale copied to clipboard
Object arrays cannot be loaded when allow_pickle=False
I get the following error during the automatic data exchange between two containers.
2021-03-03 13:14:23 Kale marshalling [INFO] Loading pandas file using Pandas backend: df_test_set_without_labels
2021-03-03 13:14:23 Kale marshalling [INFO] Loading pandas file using Pandas backend: df_train_set
2021-03-03 13:14:23 Kale marshalling [INFO] Loading numpy file using Numpy backend: X
2021-03-03 13:14:23 Kale marshalling [ERROR] During data passing, Kale could not load the following file:
- name: 'X'
The error was:
Object arrays cannot be loaded when allow_pickle=False
With numpy >= 1.16.3 (NumPy 1.16.3 Release Notes), allow_pickle was set to False by default and I need to use numpy > 1.17 as a dependency for another module.
I tried a workaround like:
import numpy as np
old = np.load
np.load = lambda *a,**k: old(*a,**k,allow_pickle=True)
Hi @anneum did using numpy > 1.17 solve the issue?
@StefanoFioravanzo no, all numpy versions >= 1.16.3 causes the error.
@anneum I just tested with numpy 1.19.5 and pickling worked fine. Can you confirm your Numpy version with a pip3 freeze
?
@StefanoFioravanzo I rebuild my image and did a pip3 freeze
as the last step of my docker build
. The numpy version inside the image is numpy==1.18.1
.
I get the same error.
2021-04-13 11:55:20 Kale marshalling [INFO] Loading numpy file using Numpy backend: X
2021-04-13 11:55:20 Kale marshalling [ERROR] During data passing, Kale could not load the following file
- name: 'X'
The error was:
Object arrays cannot be loaded when allow_pickle=False
Same behavior with numpy==1.19.5
.
@anneum can you provide the exact code or notebook you are using to create the pipeline? I'd like to try reproduce this exactly. It would be great if you could provide the simplest notebook possible that reproduces this with a requirements.txt
@StefanoFioravanzo: I have scaled the notebook (as .txt) down as far as I could.
Import Cell:
import pandas as pd
Pipeline Step: data_preprocessing
df_train_set = pd.read_csv("/home/jovyan/train_v1.4.tsv", sep='\t')
X = df_train_set.text.values
Pipeline Step: encoding
depends on data_preprocessing
print('Original: ', X[0])
Error:
2021-04-20 12:05:35 Kale marshalling [INFO] Loading numpy file using Numpy backend: X
2021-04-20 12:05:35 Kale marshalling [ERROR] During data passing, Kale could not load the following file:
- name: 'X'
The error was:
Object arrays cannot be loaded when allow_pickle=False
@anneum could also provide the CSV file you are using? I tried to reproduce this with a very simple CSV with numbers, but everything is fine. I guess you are using some particular data format.