bigbang icon indicating copy to clipboard operation
bigbang copied to clipboard

email domain analysis: list top/bottom working groups by PCA dimension

Open sbenthall opened this issue 3 years ago • 2 comments

Illustrate the principal components with top/bottom working group tags.

sbenthall avatar Mar 03 '21 21:03 sbenthall

Could you explain this a bit more. I don't know what is meant by top/bottom and tags? Is what every you describe here partially contained in the Multi-dimensional scaling of ./bigbang/examples/organizations/Full Archive Study.ipynb

Thx :-)

Christovis avatar May 21 '21 23:05 Christovis

Each working group can be seen as a document. Consider, for each email sent to the working group, the domain of the email address sender as a word.

PCA on the set of documents will produce a set of dimensions expressed as weights on each of the email domains.

In the Multi-dimensional scaling section of that notebook, each dimension is summarized by the email domains with the highest and lowest weights.

This issue asks for an additional, alternative way of summarizing the principal components.

Given a principal component and a working group as a document, the dot product of the principal component weights and the "word" count gives that working group a scalar score.

So it is possible, for each component, to show the top five/bottom five working groups according to that score.

Is that a clear explanation?

sbenthall avatar May 27 '21 19:05 sbenthall