JsonGrinder.jl icon indicating copy to clipboard operation
JsonGrinder.jl copied to clipboard

Enhancement: support for pretrained word embeddings

Open simonmandlik opened this issue 1 year ago • 0 comments

Implement a new Extractor subtype, called WordEmbeddingExtractor, for extracting NLP words using their embeddings (using Embeddings.jl and WordTokenizers.jl?)

Rough sketch of possible implementation can be found here, but this is for the old version of JsonGrinder.

A good starting point is NGramExtractor implementation, the design should be very similar.

We might also want to update suggestextractor with a new kwarg governing when Strings are extracted as ngrams and when they are tokenized

simonmandlik avatar Jun 06 '24 17:06 simonmandlik