sc.pp.scale changes adata.raw.X
Please make sure these conditions are met
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
Following this common workflow:
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
adata.layers['lognorm'] = adata.X.copy()
adata.raw = adata # full dimension lognormalized data
sc.pp.scale(adata, max_value=10)
adata
If you check adata.X, adata.layers['counts'], and adata.layers['lognorm'], and adata.raw.X, you will find that adata.X and adata.raw.X are the same. The desired behavior would probably be for adata.raw.X to be the same as adata.layers['lognorm']. It appears that sc.pp.scale is changing adata.raw. Why is that?
Minimal code sample
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
adata.raw = adata
sc.pp.scale(adata, max_value=10)
adata
adata.layers['lognorm']
array([[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 1.4028237, ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
...,
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ]], shape=(70499, 309), dtype=float32)
adata.X
array([[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, 8.037066 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
...,
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ]], shape=(70499, 309), dtype=float32)
adata.raw.X
array([[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, 8.037066 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
...,
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ],
[-0.2976397 , -0.35878736, -0.2131979 , ..., -0.14714538,
-0.32566202, -0.3301082 ]], shape=(70499, 309), dtype=float32)
Versions
scanpy: 1.11.4
Hi, you checked “I have confirmed this bug exists on the latest version of scanpy.”, but 1.10.2 is far from the latest version of scanpy. Can you please check with 1.11.4?
Sorry about that, @flying-sheep . I installed the latest version of scanpy and double checked, and the issue still exists (pasted the output above). The answer for me for now is just to use layers for everything, but I wanted to post this as the behavior is a bit unexpected and could affect analysis results without it being recognized.
hello! i am looking to contribute to scanpy and thought this might be an active and useful issue to resolve. i've used scanpy quite a bit in the past. @flying-sheep please let me know if this issue is a good tackle as a first contribution to scanpy, and if not this one, then which other issue would be better! looking to start on this today or tomorrow
I think this is actually not a scanpy but an anndata bug. When you create raw. Youre only creating a link/ pointer to the same array/matrix. So when you update .X raw also updates because it points to the same array.
To avoid this behavior:
adata.raw = adata.copy()