argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[DOCS] Create a tutorial about using SpanMarker and Argilla

Open sdiazlor opened this issue 2 years ago • 14 comments

Which page or section is this issue related to?

To create a tutorial about how to use SpanMaker and Argilla for NER

sdiazlor avatar Oct 30 '23 10:10 sdiazlor

https://huggingface.co/datasets?task_ids=task_ids%3Anamed-entity-recognition

davidberenstein1957 avatar Nov 20 '23 09:11 davidberenstein1957

https://docs.argilla.io/en/latest/tutorials/notebooks/ner_fine_tune_bert_beginners.html

davidberenstein1957 avatar Nov 21 '23 10:11 davidberenstein1957

https://www.numind.ai/blog/a-foundation-model-for-entity-recognition

davidberenstein1957 avatar Nov 21 '23 10:11 davidberenstein1957

https://huggingface.co/numind/generic-entity_recognition_NER-v1

davidberenstein1957 avatar Nov 21 '23 11:11 davidberenstein1957

Hi there! I'm Rami Ismael, the individual behind the GitHub issues initiative as discussed here. I'm currently enjoying my winter break and have some free time on my hands. I'm keen on offering my assistance to help finalize the documentation. Would that be possible?

Rami-Ismael avatar Dec 23 '23 08:12 Rami-Ismael

Perhaps this tutorial will be more useful when we release the Spans Question for Feedback Datasets? Otherwise it will be outdated quite soon.

nataliaElv avatar Jan 16 '24 12:01 nataliaElv

We're building this https://github.com/DerwenAI/textgraphs which leverages SpanMarker and other LLM-based tasks in KG construction ... and if you notice the "report" this project has a very large Argilla-shaped puzzle piece missing in its center (why we needed the gradients for extracted entity and relation streams). I'd like to offer help on the SpanMarker + Argilla tutorial too.

ceteri avatar Feb 14 '24 15:02 ceteri

I'd also like to offer help on this tutorial, whether on designing it, writing it or maintaining it.

My notes on writing a tutorial:

  • the end of the tutorial must be meaningful and achievable to a beginner
  • having done the tutorial, the reader is in position to make sense of the rest of the documentation and of Argilla itself
  • objective = turning learners into users, get the learner started on their Argilla journey not to their destination
  • Tutorials need to be useful for the beginner, easy to follow, meaningful and extremely robust, and kept up-to-date
  • build from the simplest tools or operations to the most complex
  • be concrete, built with specificity in mind, don't explain anything the learner doesn't need to know to complete the tutorial (e.g. Argilla telemetry)
  • Note that it doesn’t tell you what you will learn, just what you will do. The learning comes out of that doing.

Proposed promise for this Argilla + SpanMarker tutorial

if you have the basic knowledge required to follow this tutorial (e.g. spaCy?), and you follow its directions, you will end up with a working Argilla Server, complete with a FeedbackDataset with Span Categorization Questions, with NER label Suggestions machine-generated by SpanMarker, ready for Annotators to add Responses. Advanced readers will be able to add Metadata or Vectors.

What do you think? what would be a good amount of knowledge required?

And we are waiting for the Span Categorization to be released in the FeedbackDataset right? or did I miss this going live?

louisguitton avatar Mar 01 '24 07:03 louisguitton

hi @louisguitton ! Yes, we're working on releasing a Spans question for Feedback datasets and once that's out, we can start working on the tutorial. I think it would be highly beneficial for the adoption of this feature that the tutorial is published soon after the release.

I'll leave it to @davidberenstein1957 and @sdiazlor to tell you if they need any help with this one or if it's something they prefer to do internally.

nataliaElv avatar Mar 01 '24 10:03 nataliaElv

Very cool notes about what a tutorial should be @louisguitton , fully agree!

We used to use tutorials more as a blog post to promote and introduce argilla to new users on social media but that has created a bit of a mismatch now. We'll take it into account for version 2.0 of the docs (unstarted but planned)

dvsrepo avatar Mar 01 '24 11:03 dvsrepo

Hi, @louisguitton. Thanks for your notes! Any feedback is always welcome. Feel free to work on this tutorial and let us know if you have any doubt.

sdiazlor avatar Mar 05 '24 14:03 sdiazlor

Let's wait for the new SpanQuestion

dvsrepo avatar Mar 05 '24 14:03 dvsrepo

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] avatar Jun 20 '24 01:06 github-actions[bot]

Since we started this discussion, SpanQuestion was released

  • V1.26 Mar 22, 2024 SpanQuestion is now part of FeedbackDataset #4617, #4623, #4622
  • V1.27 Apr 18, 2024 Overlapping spans are now possible #4668, #4697
  • V1.28 May 9, 2024 Span improvements #4735, #4726
  • NER on Argilla - meetup talk - Louis Guitton with code available here (not cleaned up) https://github.com/louisguitton/mlops-talk-llm-kg/tree/main/notebooks/argilla_talk

The football news dataset, the code snippets I contribute in the talk, the structure of the talk can all be used to create a tutorial. A part 2 of the talk was also discussed, to address some of the parts I didn't have time to cover: train a model, use weak supervision with skweak, do KG construction with the entities found etc...

The scoping exercise (i.e. splitting in parts and making sure we deliver small and incremental value) for NER is key I think, so any input from User feedback or Customer needs or Product vision is welcome to help prioritise.

louisguitton avatar Jun 20 '24 07:06 louisguitton