Daft icon indicating copy to clipboard operation
Daft copied to clipboard

feat: Add image hashing functions with support for 5 algorithms

Open codekshitij opened this issue 3 months ago • 6 comments

Changes Made

Adds image hashing functionality to Daft with support for 5 algorithms: average, perceptual, difference, wavelet, and crop_resistant.

API Usage

from daft.functions import image_hash
from daft import col

# Default algorithm (average)
df = df.with_column("hash", image_hash(col("image")))

# Specific algorithm
df = df.with_column("hash", image_hash(col("image"), "perceptual"))

Implementation

  • New daft.functions.image_hash() function
  • Rust backend implementation in daft-image
  • 12 comprehensive tests covering all algorithms
  • Proper error handling and type validation

Related Issues

https://github.com/Eventual-Inc/Daft/issues/4889

Checklist

  • [ ] Documented in API Docs (if applicable)
  • [ ] Documented in User Guide (if applicable)
  • [ ] If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

codekshitij avatar Sep 17 '25 23:09 codekshitij

Codecov Report

:x: Patch coverage is 80.89552% with 64 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 74.50%. Comparing base (70116a6) to head (3f993b0). :warning: Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-image/src/ops.rs 83.95% 39 Missing :warning:
src/daft-image/src/functions/hash.rs 73.21% 15 Missing :warning:
src/daft-image/src/series.rs 67.74% 10 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5229      +/-   ##
==========================================
+ Coverage   74.48%   74.50%   +0.01%     
==========================================
  Files         969      970       +1     
  Lines      124225   124558     +333     
==========================================
+ Hits        92535    92803     +268     
- Misses      31690    31755      +65     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 97.05% <ø> (ø)
daft/functions/__init__.py 100.00% <100.00%> (ø)
daft/functions/image.py 93.10% <100.00%> (+0.51%) :arrow_up:
daft/series.py 92.77% <ø> (ø)
src/daft-image/src/functions/mod.rs 100.00% <100.00%> (ø)
src/daft-image/src/series.rs 75.90% <67.74%> (-1.88%) :arrow_down:
src/daft-image/src/functions/hash.rs 73.21% <73.21%> (ø)
src/daft-image/src/ops.rs 75.15% <83.95%> (+8.48%) :arrow_up:

... and 5 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Sep 18 '25 00:09 codecov[bot]

hey @codekshitij i added comments on the other PR that got closed. Could you address these issues?

https://github.com/Eventual-Inc/Daft/pull/5227

universalmind303 avatar Sep 18 '25 15:09 universalmind303

hey @codekshitij i added comments on the other PR that got closed. Could you address these issues?

#5227

Hey i got all your suggestion and will work on that ASAP.

codekshitij avatar Sep 18 '25 17:09 codekshitij

Thank you all for the review. And I'll fix it all ASAP. @rchowell @srilman @universalmind303

codekshitij avatar Sep 18 '25 17:09 codekshitij

Hi @codekshitij 👋 Just checking in -- how is this one going?

malcolmgreaves avatar Oct 21 '25 21:10 malcolmgreaves

Hi @codekshitij 👋 Just checking in -- how is this one going? Sorry for the delay, I'll commit the changes by the end of this week.

codekshitij avatar Oct 22 '25 02:10 codekshitij

Hey @codekshitij - Checking in on this.

madvart avatar Nov 25 '25 21:11 madvart

Hey @codekshitij - Checking to see if you are able to push this through? Thanks!

madvart avatar Dec 02 '25 20:12 madvart