scanpy
scanpy copied to clipboard
Why making feature names unique instead of aggregation?
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
Hi scanpy team! I have a rather conceptual question. Since the beginning of the single-cell analysis era, one of the standard steps in preprocessing is making the feature names unique (e.g. with adata.var_names_make_unique()
) by adding suffixes to their names. It is recommended in the scanpy tutorial and in the best practices book. It is clear how identical feature names make the following data processing challenging, but why are we handling it this way? Wouldn't it make more sense to aggregate features with identical names, summing the counts? From the biological point of view, the same gene name means the same feature, so why split it into several features and corrupt their names?