machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Named Entity Recognizer

Open MaxAkbar opened this issue 6 years ago • 114 comments

Hello ML.NET,

Is there any way I can use ML.NET to created named entities?

Thanks, -Max

MaxAkbar avatar Aug 02 '18 03:08 MaxAkbar

Currently, there is no component in ML.NET for named entity recognition. @GalOshri may be able to comment further with respect to future plans.

Zruty0 avatar Aug 09 '18 16:08 Zruty0

Ping @GalOshri

Ivanidzo4ka avatar Oct 19 '18 00:10 Ivanidzo4ka

We don't have immediate plans to add this right now, but it is on the backlog.

Does anyone have a specific scenario they are trying to enable and are blocked on this?

GalOshri avatar Oct 23 '18 23:10 GalOshri

Hi Gal,

Yes, I am waiting on this and would love to have something I can use. I need to extract custom entities\Dates\Addresses\names and blocks of text from documents.

Let me know if you want a more detailed explanation.

I know this is on your backlog and can you let me know what version this is planned for?

-Max

MaxAkbar avatar Oct 24 '18 03:10 MaxAkbar

Hi, i am using at the moment Stanford NLP (https://www.nuget.org/packages/Stanford.NLP.NER/) But it is just a Java Wrapper and doesn't support .Net Core. I would like to have more NLP (POS Tagger, NER, Named Entity Linking) possibilities native in C#.

nimasTT avatar Oct 30 '18 13:10 nimasTT

Any update on this? Stanford's NER is not a viable option considering the lack of support of .NET Core

msamara avatar Nov 27 '18 12:11 msamara

+1

tmarman avatar Dec 25 '18 18:12 tmarman

+1

rohittidke avatar Feb 07 '19 16:02 rohittidke

I would really like to see this functionality.

garywoodfine avatar Feb 18 '19 10:02 garywoodfine

Thinking of it, would it be probably a bonus to have a NLP premade tool (like spacy) for .NET in the future. When more NLP features will be added in the future, this would help for exploration.

ykafia avatar May 22 '19 18:05 ykafia

Please guys this is a very anticipated feature I would love to see, at the moment Stanford ner is the only decent library available and is not an option since it's heavily dependant on Java, either way, it has no support for .net core now.

mayakfoury avatar May 22 '19 22:05 mayakfoury

Plus Standford NLP is good for personal use and has commercial licence and usually scale and recognition is at commercial use

brykneval avatar Nov 11 '19 08:11 brykneval

@gvashishtha to drive this.

codemzs avatar Nov 11 '19 18:11 codemzs

Just an idea out of the box : With the coming of TorchSharp in ML.NET we could build a library upon different models like Alberta or GPT-2. We would only need an api around them to use in production.

ykafia avatar Nov 11 '19 18:11 ykafia

Hi all, I just joined the ML.NET team as a PM. I would appreciate understanding more about a) what scenarios you are trying to enable with Named Entity Recognition (NER) and b) what the impact of an ML.NET Named Entity Recognizer would be on your solution/business.

I notice that Stanford's NER primarily supports three classes: (PERSON, ORGANIZATION, LOCATION). Is this sufficient for all use cases?

gvashishtha avatar Nov 12 '19 19:11 gvashishtha

Hello @gvashishtha, Standford NER model you were looking at was was probably trained on three entities. Go to this link and down to Model and notice that, based on the model, there are several more entities. If you look at their test server and click on the classifier, you will notice that it will have more entities. You can also get more info from here, and if you follow other links from that page, you can get to a better API sample.

Azure does NER pretty well, but the problem with Azure not to mention the cost :) is there is a limit to the amount of text you can send.

I think what would be best is to allow the API to accept text with annotation. The annotation would describe the entity type, so it should not be static.

I hope this helps.

[Edit] Found this article that allows you to create custom-named entities/

MaxAkbar avatar Nov 13 '19 05:11 MaxAkbar

@MaxAkbar You got it!

codemzs avatar Nov 13 '19 05:11 codemzs

Just to be clear, @MaxAkbar, when you say "Azure does NER," do you mean the Text Analytics API? https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking

Additionally, can you confirm for me which of the Stanford capabilities you need for your application: 3 class, 4 class, or 7 class?

Model type Included labels
3 class: Location, Person, Organization
4 class: Location, Person, Organization, Misc
7 class: Location, Person, Organization, Money, Percent, Date, Time

gvashishtha avatar Nov 13 '19 17:11 gvashishtha

Hi @gvashishtha,

Sorry I was not clear. I was referring to LUIS. At the time when I was searching for NER, Azure didn't have a NER feature, or I didn't look hard enough, just LUIS. That was a long time ago, :).

Anyway, LUIS has a feature called Entities. You provide an utterance, then mark the word or words and then add a label to identify the entity.

For example: Entities

In the image above, we are providing utterances then labeling them with a custom entity. I think internally having known entities like Location, Person, Organization, Money, Percent, Date, Time is fine, but there should also be a feature to add custom entities.

[Edit] Forgot to note that my application I need to extract names but they must not be labeled name. For example, I need the name of the insurer vs. the name of the insured or seller vs. buyer.

I hope this helps. Max

MaxAkbar avatar Nov 14 '19 04:11 MaxAkbar

Hi @gvashishtha,

I would love to see a functioning C# NER library that lets you train your own model with feature engineering, custom categories, and user-friendly parameterization. I found the RNNSharp library very helpful for NER development in C#. You might benefit from having a look at it. If I am not mistaken, it makes use of neural networks (bidirectional LSTM) for sequence labeling tasks such as NER.

Hope you can find that of use. Nicolás

njfm0001 avatar Nov 30 '19 19:11 njfm0001

@MaxAkbar @njfm0001 have you looked into this library? https://github.com/microsoft/Recognizers-Text/tree/master/.NET

gvashishtha avatar Mar 27 '20 17:03 gvashishtha

@gvashishtha As far as I can see, that library doesn't support PERSON, LOCATION or ORGANIZATION types, but dates, numbers, emails...

njfm0001 avatar Mar 28 '20 16:03 njfm0001

Hello @gvashishtha,

Thank you for providing the link to the text recognizers. I had looked at them when I was working with LUIS. I am using the recognizers in my current project.

The recognizers, in my opinion, is designed to extract written entities into numerical, date, and other formats. They identify a pattern and transform it, whereas NLP extracts entities based on grammar.

The underlying engine of the recognizers is regular expressions. For example, "I have two apples" when used in the recognizer will return the number 2, where I would identify the entities "I = Person" and "Apple = Fruit."

I hope this clarifies the requirements.

MaxAkbar avatar Mar 28 '20 18:03 MaxAkbar

I would also like to see this. My scenario is that I want to recognize rock climbing related names & locations out of sentence. I have already "classified" some data like:

Bouldering in Central Park!!||Central Park
Not the best angle but check out that latch!!! Golden Bowl (V7) in Squamish||Golden Bowl||Squamish  
Does anyone have a used crash pad for sale?||

(where I have a sentence followed by || then all the names/locations separated again by ||)

derekantrican avatar Mar 31 '20 19:03 derekantrican

Another vote for an ML.NET implementation of NER.

We have a commercial application that runs on the user's machine locally - no cloud processing yet. We would like to be able to do 7-class Named Entity extraction on large bodies of text.

keithrowe avatar Apr 03 '20 16:04 keithrowe

+1 to NER Also, I think from a .NET perspective, something like spaCY would be the best use case. We use it now (because there is no .Net equivalent) and it works great.

  1. Start with POS tagging so it then becomes easier to understand tokens in context.
  2. Provide ability to train custom tokens (a great example out there is the Go/Golang training videos on using spaCY)
  3. Provide out of box, pre-trained models for people, places, etc.

MS Video Indexer seems to have a great implementation of this for indexing videos and understanding topics, words, expressions, etc.

hobbsa avatar Apr 15 '20 11:04 hobbsa

+1 to NER. We are trying to recognize personal informations from our train data, including

  • AGE
  • CREDIT_CARD_NUMBER
  • DATE
  • EMAIL_ADDRESS
  • IP_ADDRESS
  • LOCATION
  • MAC_ADDRESS
  • PASSPORT
  • PERSON_NAME
  • PHONE_NUMBER
  • SWIFT_CODE
  • US_DRIVERS_LICENSE_NUMBER
  • US_SOCIAL_SECURITY_NUMBER
  • US_VEHICLE_IDENTIFICATION_NUMBER
  • US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER

Looking forward to NER feature in ML.Net

moria97 avatar Apr 26 '20 09:04 moria97

+1 on this. I definitely need custom capabilities as I need to pull things like US district court information, trying to figure out who is the defendant and plaintiff, etc.

JaCraig avatar May 06 '20 02:05 JaCraig

@gvashishtha can you provide some feedback? Are there plans to do this?

chester89 avatar May 07 '20 08:05 chester89

I agree with the above Comments. Here are the reasons to include.

  1. NER can be used for .Net Core and UWP apps.
  2. Stanford NLP uses IKVM which does not support .Net Core as of this time (only .Net framework), as an example this LOC from Stanford NLP will fail, var classifier = CRFClassifier.getClassifierNoExceptions( classifiersDirecrory + @"\english.all.3class.distsim.crf.ser.gz"); because lack of FileStream support.
  3. IKVM END of Life and future support for Stanford NLP for .Net will be limited. So its highly unlikely a future version of Stanford NLP will support next framework or .Net Core releases. https://sergey-tihon.github.io/Stanford.NLP.NET/faq.html
  4. Azure TextAnalytics is a good option but would at scale would be good for sending batches of text and wait times/user experience management will be a hassle to manage on ASP.Net application looking for real time NER.
  5. With PII extraction being important would make sense to include NER in ML.Net.

So @gvashishtha any updates on timelines or plans to include this in any future release?

kartikvega avatar Sep 16 '20 16:09 kartikvega