imagemonkey-core icon indicating copy to clipboard operation
imagemonkey-core copied to clipboard

Brainstorming/experiment - word embedding for vocabulary

Open dobkeratops opened this issue 5 years ago • 6 comments

So there are precomputed “word embeddings” trained from text which turn words into high dimensional vectors (eg 500d), supposedly these can find groupings , similarities, and overlapping concepts through axes in the space.. the impressive example being “king -man +woman =queen”.

I imagine the technique has limits though, eg “speed” “bump” .. may not combine well as “speed bump” (it’s not a fast moving bump, it’s a speed restriction bump.. my suspicion is it would only work well for individual words,

But could we try to describe labels as combinations of single words I.e. “speed bump”= “(slow restriction + safety + bump + road)/5” ... i.e. if you found the point between those, would it have right right vector value for speed bump? (Let me check if the combination should be an average, or just normalising the vector... it’s probably the latter actually)

This would be vaguer than graph links, which are supposed to be strict “is a..” relations. It’s more like “related ideas”, “word association”. It would be more like “see also” links in Wikipedia.

It’s another idea for “labelling the labels”, and possibly leveraging existing natural language processing resources.

Another idea would be to try and train a word embedding treating the combined labels as unique words (maybe preprocessing the training text such that “speed bump” is replaced with “speedBump”

It might even “just work”, i.e. the word embeddings will have an aspect of how words behave as prefixes.. but I have no experiments on this

Could these embeddings be used to find similar labels, and could you just train a net to emit embeddings rather thsn labels? (And then pick a label blend to approximate then recognised embedding..)

dobkeratops avatar Jun 24 '19 09:06 dobkeratops