argilla
argilla copied to clipboard
Add the capability to annotate overlapping spans
Is your feature request related to a problem? Please describe. This is key advantage in prodigy since it allows for variable length spans being annotated. This allows for the use of annotated data in question answering, or other prompt based span prediction exercises.
Describe the solution you'd like https://prodi.gy/docs/span-categorization
Describe alternatives you've considered https://prodi.gy/docs/span-categorization
Additional context https://explosion.ai/blog/spancat
Dear Dhruv, I fully agree this is an important feature. Thanks for pointing it out!
cc @frascuchon @davidberenstein1957
Thanks, @dhruvsakalley
Totally agree with that. We need to adapt the UI to this kind of behaviour. Then, think about the integrations regarding the prepare_for_training methods for token classification.
Again, thanks for your feedback!
cc @keithCuniah @leiyre
This issue is stale because it has been open for 30 days with no activity.
I believe this is already accepted as an enhancement, so the stale status is not applicable.
Hi all, very happy to hear that this feature has already been requested and that is now in your roadmap. I will definitely be able to test this, even in its early stages.
I might even be able to contribute (free time pending 😅) somehow. Could anyone point me to the relevant bits of code that will need to be enhanced/modified? I imagine this will require creating a new SpanClassification class.
Thank you 🙏🏻
Hi, @filippo82, first of all, thank you for the offer to contribute! However, given the complexity of the task, and other priorities, we decided to change the roadmap definition to somewhere in the future.
Hi @davidberenstein1957 👋🏻 thanks for the update.
Out of curiosity, why do you say "given the complexity of the task"? I only had a very brief look at the portion of Argilla code of course (so my hunch can obviously be waaaaaay off 😅) but I had the feeling that to implement this new feature, while of course not trivial, should not be too complex given that one could start from an existing XXXClassification class.
... but of course I was not thinking about the UI-related development 🙀
Hi @davidberenstein1957 @dvsrepo - do you have an idea of rough timescales for getting overlapping spans into Argilla? We want to use Argilla as our main labelling tool for the startup I work for, but as a lot of our tasks involve overlapping spans this might be a dealbreaker.
Hello @kdutia we do consider this as a key feature for Argilla but it sadly has lower priority than other things on our roadmap. Your contributions do help us to prioritize this better so they are more than welcome, and we will consider them during the planning for Q4 2023 and Q1 2024.
@kdutia we fine-tuned this internally a bit and want to take this into account when working on adding TokenClassification to the FeedbackTask, which will be handled in Q3 2023. We will keep you updated on that issue.
hi, thanks so much for taking the feedback on board and reprioritising @davidberenstein1957! looking forward to it and other future updates :)
@filippo82 , @dhruvsakalley @kdutia @cceyda we are hard at work at tackling this issue at the moment. Would any of you be interested in providing some feedback and pointers w.r.t. what you would expect from the implementation? If so, could you ping me on Slack or send me an email at [email protected]?
Thank you David for all the hard work. Please shoot me a meeting invite for Saturday or Sunday, we can go over the design and would be happy to provide feedback
Hi @davidberenstein1957 👋🏻 I'll send you a message on Slack 📫
@dhruvsakalley @filippo82 @kdutia and everyone else who upvoted this issue. We're finally shipping this issue later this month! If you'd like to try it out before it comes out and leave some feedback you can do so here: https://nataliaelv-beta-testing.hf.space/ You only need to log in with a hugging face account and play around 😃