alibi-detect icon indicating copy to clipboard operation
alibi-detect copied to clipboard

Can't use `od.fit` on batched datasets

Open Oliver-Chalkley opened this issue 2 years ago • 1 comments

od.fit(X_train,...) works fine when X_train is a Numpy array. However, when I use the tf.keras.utils.image_dataset_from_directory method to batch the dataset

# Use data loader to avoid memory issues
data_dir = "/path/to/training/data"

batch_size = 1
img_height = 112
img_width = 112

train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)

with

print(X_train)
<BatchDataset element_spec=(TensorSpec(shape=(None, 112, 112, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

I get the following error

ValueError: Slicing dataset elements is not supported for rank 0.

Is it able to deal with these or am I doing it wrong?

Oliver-Chalkley avatar May 16 '22 18:05 Oliver-Chalkley

Hi @Oliver-Chalkley. You aren't doing anything wrong! Unfortunately, it's just that the outlier detector .fit() methods currently only accept NumPy arrays. The closest thing to what you are looking for is currently the Alibi Detect TFDataset. This is a subclass of tf.keras.utils.Sequence that can be passed to drift detectors such as ClassifierDrift for use during training.

We'll soon be revamping our outlier detectors and the trainer's, and will certainly look to include support for batched datasets.

ascillitoe avatar May 24 '22 14:05 ascillitoe