Preserve order of categorical groups

Open ernohanninen opened this issue 1 year ago • 1 comments

Hi! Thanks for the great package. I was wondering if it's possible to modify the biplot so that the order of categories in df_subset.treatment is preserved in the legend. Currently, I'm unable to assign the colors to my groups as I'd like. Additionally, in the PCA plot, the groups seem to overlap quite a bit. This is partly because the groups are plotted on top of each other. It would be great if the groups could be more mixed together to improve visualization. Shuffling my dataframe before passing it in to the pca didn't help with this.

model.biplot( labels=df_subset.treatment, title='', )

Jan 12 '25 12:01 ernohanninen

For the colors you can use c parameter. However, when the labels is provided, the colors are automatically set based on the class labels. Maybe this was not entirely clear. For the new version, I added an info message when this happens.

Example to color each sample as you wish:

from pca import pca
# Initialize pca
model = pca(n_components=2, verbose='info', n_std=2)
# Load example data set
df = model.import_example(data='iris')
import colourmap as cm
colours = cm.fromlist(df['label'].values, cmap='Set1', scheme='rgb')[0]        
# Fit transform
out = model.fit_transform(df)

ax = model.biplot(s=200, SPE=True, HT2=True, c=colours)

# In the example below, the colors in are overruled by the class labels based on the (default) cmap.
ax = model.scatter(s=200, SPE=True, HT2=True, labels=df['label'], c=colours)

You can use the jitter functionality to make sure that scatter points are not on top of each other. Depending on the range of your values, add the amount of jitter you need.

# Import library
from pca import pca
# Initialize
model = pca()
# Fit model using PCA
model.fit_transform(df)
# Make biplot
model.biplot(jitter=0.1)

Jun 21 '25 10:06 erdogant