scanpy
scanpy copied to clipboard
log1p warns adata.X is logged when it may not be (when other layers are logged)
When I use sc.pp.log1p(adata) and then sc.pp.log1p(adata, layer='other') it warns me that the data has already been logged even though I am logging a layer as opposed to adata.X.
Would be nice to flag logging for each layer instead of when anything is logged.
import scanpy as sc
adata = sc.datasets.pbmc3k_processed()
adata.layers['other'] = adata.X
sc.pp.log1p(adata, layer='other')
sc.pp.log1p(adata)
WARNING: adata.X seems to be already log-transformed.
Versions:
scanpy==1.5.2.dev5+ge5d246aa anndata==0.7.3 umap==0.3.10 numpy==1.18.5 scipy==1.5.0 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.7.1 louvain==0.6.1 leidenalg==0.7.0
Happens here:
https://github.com/theislab/scanpy/blob/3558a42e747856cbf55c4d118566a155c6717178/scanpy/preprocessing/_simple.py#L286-L287
Where does .uns['log1p'] get set other than there?
Hi @gheimberg,
In your example you are not using a deepcopy to assign adata.X to adata.layers['other']. So when you log transform the data in the layer, it automatically log transforms the data in adata.X as well, as you just passed the reference. That being said, this is still a bug as even with a adata.X.copy() the warning is given.
Guys we should just keep the layer info here in log1p:
data.uns['log1p'] = {'base': base}
like
data.uns['log1p'][layer] = {'base': base}
I've come across a strange behavior related with this issue. Depending on whether or not I save the object I get the same warning as OP.
This works as it should:
import scanpy as sc
adata=sc.read_h5ad(data_dir+'scanpy_QC_sexchrom.h5ad')
adata.raw=adata.copy() #data to save
sc.pp.log1p(adata) # logaritmize
### Test 1, no saving, works as it should
adata=adata.raw.to_adata()
sc.pp.log1p(adata)
##>>> no warning
Saving mid-way does not allow to avoid the warning, even restarting the kernel before reading the data:
import scanpy as sc
## same as above
adata=sc.read_h5ad(data_dir+'scanpy_QC_sexchrom.h5ad')
adata.raw=adata.copy() #data to save
sc.pp.log1p(adata) # logaritmize
### Test 2, saving and re-assigning from raw
### saving object, reading, testing again
### Doesnt work
adata.write_h5ad(tmp+'scanpy_test.h5ad')
adata=sc.read_h5ad(tmp+'scanpy_test.h5ad')
adata=adata.raw.to_adata()
sc.pp.log1p(adata)
###>>>WARNING: adata.X seems to be already log-transformed.
I'm on scanpy 1.9.1 if it matters
I must also mention that upon reading in the data:
- running
adata.uns['log1p']returns{}; - setting
adata.uns['log1p']["base"] = Noneafter reading doesn't help. - running
del adata.uns['log1p']solves the problem. Visual inspection of expression values inadata.Xseem to not be log-transformed.