huggingface.js icon indicating copy to clipboard operation
huggingface.js copied to clipboard

Adding a new task to Hub: object tracking in videos

Open kadirnar opened this issue 2 years ago • 6 comments

kadirnar avatar Jan 21 '23 12:01 kadirnar

Wanted to open the discussion from the discord, @osanseviero what is required for us to have a separate pipeline/task for object tracking in videos? Pinging @sgugger too!

merveenoyan avatar Jan 23 '23 10:01 merveenoyan

To have a pipeline in Transformers, we'd need pretrained models that do this. I don't think that is the case right now.

sgugger avatar Jan 23 '23 14:01 sgugger

To have a pipeline in Transformers, we'd need pretrained models that do this. I don't think that is the case right now.

Models:

  • https://huggingface.co/kadirnar/osnet_x0_5_imagenet
  • https://huggingface.co/kadirnar/osnet_x0_25_imagenet
  • https://huggingface.co/kadirnar/osnet_x1_0_imagenet

Supported tracking algorithms:

Demo: https://huggingface.co/spaces/kadirnar/torchyolo

kadirnar avatar Jan 25 '23 19:01 kadirnar

Hey all! Let me copy-paste the template for tracking new tasks

Note that you're not expected to do all of the following steps. This helps track all the steps required to get a new task fully supported in the Hub 🔥

  • [ ] Integration with Inference API. Select at least one of the following:
    • [ ] Added a transformers pipeline
    • [ ] Added to Community Inference API for 3rd party library
    • [ ] Added to Community Inference API for generic
  • [ ] Added basic UI elements (icon, order specification, etc)
  • [ ] Added a widget

Integration guide: https://hf.co/docs/hub/models-tasks

osanseviero avatar Jan 26 '23 13:01 osanseviero

@osanseviero @sgugger I was thinking more for whether there should be a separate task for this or if this could be covered under object detection as is in our ecosystem, that's why I asked above question 🙂

merveenoyan avatar Jan 30 '23 16:01 merveenoyan

I think it's different, as one task operates with video as inputs (so temporal information) while the other just operates with static input images.

osanseviero avatar Jan 31 '23 11:01 osanseviero