roboflow-python icon indicating copy to clipboard operation
roboflow-python copied to clipboard

Feature: Query Search Images in a Dataset

Open Ashp116 opened this issue 5 months ago • 1 comments

Description

This PR adds query() and query_all() methods to enable query search across images in a dataset using Roboflow's /search/v1 endpoint. These methods support flexible search queries (e.g., by filename, tags, or project name) and allow users to:

  • Retrieve specific metadata fields (e.g., tags, filename, dimensions).
  • Control pagination via pageSize and continuationToken.
  • Stream results in batches (query_all) or retrieve a single page (query).

This enhancement improves dataset navigation and filtering by allowing more expressive, programmatic image searches within a project.

It addresses the issue discussed in #360 by providing a querying interface for image datasets.

List any dependencies that are required for this change:

This change does not introduce any new dependencies; it uses only existing libraries already required by the project.

Type of change

  • [ ✅ ] Bug fix (non-breaking change which fixes an issue)
  • [ ✅ ] New feature (non-breaking change which adds functionality)
  • [ ➖ ] This change requires a documentation update: Not really sure

How has this change been tested, please provide a testcase or example of how you tested the change?

The change was tested with the following example script, which performs a semantic search for images by filename in a dataset using the new query_all and query methods. It loads necessary queries all matching images in pages, collects the results, and prints the total count along with sample entries:

import os
from dotenv import load_dotenv
import roboflow

load_dotenv(".env")

API_KEY = os.getenv("API_KEY")
WORKSPACE = os.getenv("WORKSPACE")
PROJECT_ID = os.getenv("PROJECT_ID")

rf = roboflow.Roboflow(api_key=API_KEY)
workspace = rf.workspace(WORKSPACE)
project = workspace.project(PROJECT_ID)

filename = "4B-1K"

# Test query() - single page of results
single_page_results = project.query(query_str=f'filename:"{filename}"', page_size=10)
print(f"Single page results count: {len(single_page_results)}")
for image in single_page_results[:5]:
    print(image)

print("\n---\n")

# Test query_all() - all pages, streamed
all_results = []
for page in project.query_all(query_str=f'filename:"{filename}"', page_size=10):
    all_results.extend(page)

print(f"Total results from query_all: {len(all_results)}")
for image in all_results[:5]:
    print(image)

Any specific deployment considerations

N/A

Docs

  • [ ❌ ] Docs updated? What were the changes: (query and query_all functions were added)

Ashp116 avatar Jun 15 '25 08:06 Ashp116

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: Ashp116
:x: pre-commit-ci[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jun 20 '25 09:06 CLAassistant