mlx-vlm
mlx-vlm copied to clipboard
Match the output of scipy.ndimage.zoom on multimodality's vision module `resize_image()`
Scipy was deprecated in https://github.com/Blaizzy/mlx-vlm/pull/301
And replaced with PIL and CV2. However, they are not numerically identical or within the threshold.
The replacements work well and don't miss details, but I would like to approximate scipy.ndimage.zoom as much as possible since the source implementation uses it and to avoid edge cases.
The closest implementation:
def resize_image(image_np, new_size=(96, 96), order=1):
"""
Resize an image with multiple channels using PIL.
Args:
image_np (numpy.ndarray): The input image array of shape (height, width, channels).
new_size (tuple): The target size as (height, width).
order (int): The order of interpolation (used to determine resampling method).
Returns:
numpy.ndarray: The resized image array in the same format as input.
"""
image_np = image_np[0]
# Get dimensions
height, width, channels = image_np.shape
# Choose interpolation method based on order parameter
resample_method = Resampling.BILINEAR # Default to bilinear
if order == 0:
resample_method = Resampling.NEAREST
elif order == 2 or order == 3:
resample_method = Resampling.BICUBIC
# Handle different channel configurations
if channels == 1:
# For single-channel images (grayscale)
# Reshape to 2D array (height, width)
image_2d = image_np.reshape(height, width)
# Create PIL image - ensure proper mode and data type conversion
pil_image = Image.fromarray(image_2d.astype(np.float32))
# Resize using PIL (note: PIL takes width, height order)
resized_pil = pil_image.resize(
(new_size[1], new_size[0]), resample=resample_method
)
# Convert back to numpy array, reshape to add channel dimension
resized_np = np.array(resized_pil).reshape((new_size[0], new_size[1], 1))
else:
# For multi-channel images, process each channel individually
resized_channels = []
for c in range(channels):
channel_data = image_np[:, :, c]
pil_channel = Image.fromarray(channel_data.astype(np.float32))
resized_channel = pil_channel.resize(
(new_size[1], new_size[0]), resample=resample_method
)
resized_channels.append(np.array(resized_channel))
# Stack channels back together
resized_np = np.stack(resized_channels, axis=2)
# Convert to mx.array
return mx.array(resized_np)