huggingface.js icon indicating copy to clipboard operation
huggingface.js copied to clipboard

add function to combine all tasks metadata into tasks.json

Open zeke opened this issue 9 months ago • 5 comments

This PR updates the inference-codegen script to build a JSON file containing all the metadata of all the tasks.

The goal is to make this data more portable, so it can be easily consumed in different contexts outside of the huggingface.js codebase, e.g. for Replicate model classification.

To test:

$ pnpm run inference-codegen

Here's an example of the current output: https://gist.github.com/zeke/460a58b7aa50e305072415844a335209

To Do

  • [x] Update inference-codegen script to output all task data as tasks.json
  • [ ] Dereference the JSON schema so everything is included.
  • [ ] Make sure summary is included. e.g. "Keypoint detection is the task of identifying meaningful distinctive points or features in an image."
  • [ ] git-ignore generated tasks.json file.
  • [ ] ???

zeke avatar Apr 01 '25 15:04 zeke

@SBrandeis @julien-c @Wauplin 👋🏼

I finally got around to bumping this forward a bit. My goal is to use this to classify all the models on Replicate.

Could use a little help with the dereferencing the JSON $refs.. I'm not familiar with quicktype and my initial attempts failed.

zeke avatar Apr 01 '25 15:04 zeke

Hey @zeke , thanks for looking into this! Just to be sure, the goal of this PR is to generate a single tasks.json with all definitions from all input/output/output_stream from all tasks defined in https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks. Am I correct? No extra information added to it? (just want to be sure about the goal of it)

Wauplin avatar Apr 01 '25 16:04 Wauplin

Hey @zeke , thanks for looking into this! Just to be sure, the goal of this PR is to generate a single tasks.json with all definitions from all input/output/output_stream from all tasks defined in https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks. Am I correct? No extra information added to it? (just want to be sure about the goal of it)

Yep! That's the goal!

zeke avatar Apr 01 '25 21:04 zeke

Poking around the codebase, I see other code paths where more task types are mentioned. Here's one: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/tasks/index.ts#L132

For example, image-to-video is not present in the code I'm generating here, but it is present in that file ☝🏼

Let me know if there's a better way to structure this to be using the most complete and up-to-date set of Tasks.

zeke avatar Apr 01 '25 21:04 zeke

Ideally I also want the generated file to include these prose summary strings for each Task: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/tasks/image-to-image/data.ts#L103

zeke avatar Apr 01 '25 21:04 zeke

I didn't need this after all.

This is good enough for my purposes:

import { TASKS_DATA } from '@huggingface/tasks'
import type { TaskData } from '@huggingface/tasks'

Closing!

zeke avatar May 13 '25 21:05 zeke