Composite seed words (is it supported by default or at least possible without modifications?)
Is the use of composite meaning seed words possible? Like for instance: "body dysphoria" (dysphoria by itself can mean multiple things, when associated with body is closer to a topic that I'm looking for).
If it is not supported by default, what would be the easiest way to implement this?
Would something like passing "body dysphoria", vectorizing it and including it as a vector directly into the seed list work? (by including extra logic into how seed words are handled)
Or would it be possible only with a custom tokenizer rule?
The seed words themselves are passed in their entirety to an embedding model, so from that perspective the seed words will have a significant effect in the steering of topics. For the word vectorizer, you would have to make sure that n-grams are supported if you also want to increase their c-TF-IDF values but it is not necessary if you do not care about the weighting of the seed words themselves.
I understand, so it should work out of the box, as far as semantic meaning of the words together is concerned. However in my case I might have a tracked topic containing a seed term which is "dysphoria" alone and another topic which contains "body dysphoria". Even more so for a topic consisting of "family" vs a topic consisting of "family issues". So in this regard, I suppose n-gram might be needed?
Yes, you would need n-grams for the representations themselves but not for the assignment of topics since that is handled automatically.