dataprep
dataprep copied to clipboard
Text Analysis
Is your feature request related to a problem? Please describe. The goal of this issue is to enrich the text analysis of dataprep.eda.
Describe the solution you'd like
- plot(df, x): method="tfidf" top-k keywords in the column x. Time: 2021/01/19 - 2021/02/28
- polt(df, x): method="ngram" top-k n-gram of column x. Time: 2021/01/19 - 2021/02/28
- plot(df, x, y): method="pca" Reduce the dimension of the data by using PCA. A scatter plot will be shown.
Time: 2021/03/01 - 2021/03/19
All of the above functions will work if the x column contains text data.
@jinglinpeng @dovahcrow
Describe alternatives you've considered N/A
Additional context N/A
a reference from previous issue: https://docs.google.com/document/d/1EQUBEgU_khNl51Z2FPv_4rUGziiZz8fYzXKvqsVXGFU/edit?usp=sharing
@jinglinpeng n-gram frequency