hub-docs icon indicating copy to clipboard operation
hub-docs copied to clipboard

Modality/task agnostic models in AutoTrain (tokens/embeddings in, tokens/embeddings out)

Open josephrocca opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe. For the last few years I've been improving my web businesses by integrating ML models in various places to add features, reduce moderation labor, and so on. But a big problem that I often run into is that the "auto-ML" solutions that exist (like Huggingface's AutoTrain, Google Cloud's AutoML, etc.) have very "vertical" model/dataset types. That is, if your task doesn't conform closely to one of several discrete categories (e.g. text classification, image segmentation, etc.), then you're likely out of luck.

So for the last couple of years I've been waiting for an auto-ML service to release a "fallback" model/dataset type which can capture the "long tail" of possible tasks. It's possible that my situation is somewhat unique, and the longer I wait, the more I think there's a chance that's true, but I still definitely doubt it. I think that "Software 2.0" is going to eat much more of "Software 1.0" than just the verticals that are available in most current auto-ML platforms, and my experience as a dev and business owner working to integrate ML does seem to back that up.

Describe the solution you'd like I'm guessing that there are at least two problems with deploying a solution here:

  1. It's harder to lean on pretrained models when the auto-ML system doesn't know the task up-front, and from-scratch training is obviously much more costly.
  2. It's still early days for modality/task agnostic models.

Regarding point 1, the increased cost may put it out of reach for some use cases, but ML solutions can easily save a small business tens of thousands of dollars per year, and so I think there are many use cases like mine which make the high cost easily bearable. Also, for something like Hierarchical Perceiver, if the user-provided inputs and outputs are of known formats (e.g. images, text), then they could be automatically preprocessed/embedded which would make training a lot cheaper than going completely from scratch.

Regarding point 2, it seems like Perceiver AR and Hierarchical Perceiver (code release soon) might be good enough to start experimenting here. I'm not technical enough to assess all the various factors that make these models viable despite the generality trade-off that they're making, but taking the paper results at face value, and observing that lucidrains was pleasantly surprised about the performance of his Perceiver AR implementation makes me think that we're at the point where it might be plausible to deploy a "general"/"fallback" auto-ML solution with models like these.

Describe alternatives you've considered Currently the only alternative for me is to train models manually in notebooks. This involves significantly more effort than a good auto-ML solution would.

Additional context No additional context, but I just want to say that I'm at the stage where I just want to throw money at anyone who can deploy something like this 😄 but it's possible that there aren't many others with similar needs, and if that's the case, then I of course understand that it may not be possible for y'all to prioritise it at this point.

Thanks!

josephrocca avatar Jul 21 '22 06:07 josephrocca

(thanks for the detailed writeup as usual @josephrocca 🔥)

Do you have maybe an example of a notebook you're using, so we can have a look at one concrete use case?

julien-c avatar Jul 21 '22 07:07 julien-c

Thanks for the reply @julien-c! My most recent use case that isn't suited to common auto-ML model/task categories is vaguely similar to the sketch-based character search feature in Google Docs (similar: https://shapecatcher.com) except I need to map a sketch (either vector or raster - not sure which would work better yet) to a sequence of tokens (from a very large vocabulary), rather than one Unicode character. The output sequence length is not fixed, but wouldn't be larger than ~50, so I think the task might be amenable to both Perceiver AR and Hierarchical Perceiver - I'd ideally like to try both.

I unfortunately don't have a notebook for this, since I've only just begun planning the feature. I did have a play with the Perceiver AR training and inference on Colab (it's quite easy to get started using the inference.ipynb and training instructions in their repo) but have yet to make any actual attempt at building this feature.

josephrocca avatar Jul 21 '22 09:07 josephrocca

FYI I transferred the issue to the corresponding repo

osanseviero avatar Jul 21 '22 21:07 osanseviero

Regarding efficacy of domain-agnostic models: https://arxiv.org/abs/2112.03243 => putting the inductive biases (in their case: "cameras, projective ray incidence and epipolar geometry") in the input rather than in the model architecture (Perciever IO in that paper) produced results that were competitive with bespoke models.

The point of auto ML is of course to take burden off the user, but in this "fallback" model case, I think it's pretty reasonable to expect the user to do some extra data pre-processing work if they want to get competitive performance (and help articles/docs could be provided to help guide the user through this, of course) - especially if the alternative is to not have a fallback at all. In some cases this pre-processing could be done automatically (with a checkbox or something in the UI) - e.g. by pre-embedding/tokenizing known data formats like images, audio etc. (to augment the input or replace it) as mentioned in the original post.

josephrocca avatar Jul 30 '22 09:07 josephrocca