haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Port Haystack v1 DocumentClassifier node to Haystack v2

Open ms130 opened this issue 1 year ago • 6 comments

Is your feature request related to a problem? Please describe.

I've been using the DocumentClassifier node in Haystack v1 with a zero-shot classification model to label documents with categories, which are attached to their metadata. We have recently migrated our code to Haystack v2 but have discovered that this component does not yet exist in v2, so I'm currently unable to classify documents.

Describe the solution you'd like

It would be great if someone were able to port this very useful v1 node into a v2 component please! It would also be tremendously useful to add the multi_label argument (see here) to the new component so that the model can be run assuming multiple labels can be true. The existing v1 node doesn't provide this flexibility, so I created a custom node by subclassing it and modifying it's behaviour.

Describe alternatives you've considered

I considered creating my own custom DocumentClassifier component in v2, but have not started this yet, and am unsure about how difficult it would be.

ms130 avatar May 08 '24 11:05 ms130

This is a legitimate request!

I would start with implementing a TransformersZeroShotDocumentClassifier, only focusing on zero-shot classification.

The code should not be difficult to migrate, starting from the 1.x version.

I will tag this issue as "contributions wanted" and see if any community members would like to address it.

anakin87 avatar May 09 '24 09:05 anakin87

Hi @anakin87, I would like to work on this. If I am not wrong this ZeroShotDocument classifier must be ported here in align with Haystack 2.0 nomenclature?

srini047 avatar May 12 '24 13:05 srini047

Good to hear... Yes, I think it should be placed in classifiers.

anakin87 avatar May 12 '24 14:05 anakin87

This issue does't seem to have moved forward. I like to work on it.

Thanks,

arminnajafi avatar Jun 25 '24 07:06 arminnajafi

Hey @arminnajafi, Please confirm if you are still working on this. Otherwise, I'd like to pick it up.

nvzard avatar Jul 09 '24 12:07 nvzard

Hi @anakin87 , I have made a PR for the zero-shot document classifier. Let me know if you find anything missing in the implementation. :)

jpatra72 avatar Aug 11 '24 20:08 jpatra72