add function to combine all tasks metadata into tasks.json
This PR updates the inference-codegen script to build a JSON file containing all the metadata of all the tasks.
The goal is to make this data more portable, so it can be easily consumed in different contexts outside of the huggingface.js codebase, e.g. for Replicate model classification.
To test:
$ pnpm run inference-codegen
Here's an example of the current output: https://gist.github.com/zeke/460a58b7aa50e305072415844a335209
To Do
- [x] Update
inference-codegenscript to output all task data astasks.json - [ ] Dereference the JSON schema so everything is included.
- [ ] Make sure
summaryis included. e.g. "Keypoint detection is the task of identifying meaningful distinctive points or features in an image." - [ ] git-ignore generated
tasks.jsonfile. - [ ] ???
@SBrandeis @julien-c @Wauplin 👋🏼
I finally got around to bumping this forward a bit. My goal is to use this to classify all the models on Replicate.
Could use a little help with the dereferencing the JSON $refs.. I'm not familiar with quicktype and my initial attempts failed.
Hey @zeke , thanks for looking into this! Just to be sure, the goal of this PR is to generate a single tasks.json with all definitions from all input/output/output_stream from all tasks defined in https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks. Am I correct? No extra information added to it? (just want to be sure about the goal of it)
Hey @zeke , thanks for looking into this! Just to be sure, the goal of this PR is to generate a single
tasks.jsonwith all definitions from all input/output/output_stream from all tasks defined in https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks. Am I correct? No extra information added to it? (just want to be sure about the goal of it)
Yep! That's the goal!
Poking around the codebase, I see other code paths where more task types are mentioned. Here's one: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/tasks/index.ts#L132
For example, image-to-video is not present in the code I'm generating here, but it is present in that file ☝🏼
Let me know if there's a better way to structure this to be using the most complete and up-to-date set of Tasks.
Ideally I also want the generated file to include these prose summary strings for each Task: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/tasks/image-to-image/data.ts#L103
I didn't need this after all.
This is good enough for my purposes:
import { TASKS_DATA } from '@huggingface/tasks'
import type { TaskData } from '@huggingface/tasks'
Closing!