Autometa
Autometa copied to clipboard
PCA behavior
Would it be okay to switch:
https://github.com/KwanLab/Autometa/blob/0d9028cf7bad20d6e28667aaba9d3889a15ace09/autometa/common/kmers.py#L601-L607
to adapt to a lower pca dimension when there aren't enough contigs/kmers
if n_components > pca_dimensions and pca_dimensions != 0:
if n_samples < pca_dimensions:
logging.warning(f"n_samples ({n_samples}) is less than pca_dimensions ({pca_dimensions}), lowering pca_dimensions to {min(n_samples, n_components)} .")
pca_dimensions = min(n_samples, n_components)
logger.debug(
f"Performing decomposition with PCA (seed {seed}): {n_components} to {pca_dimensions} dims"
)
X = PCA(n_components=pca_dimensions, random_state=random_state).fit_transform(X)
n_samples, n_components = X.shape
To be clear -> as written this would only happen in the instance that there are less "samples" (contigs) than there are PCA dimensions
What would the point be of doing PCA on a dataset with less than 50 contigs before some other dimension reduction technique? I think before making this change there should be some data gathered on whether it is useful or makes a difference.
The main reason is so a minimal dataset that doesn't take forever doesn't fail when testing the workflows.