chemCPA
chemCPA copied to clipboard
Clarifications about LINCS data / evaluation
Hi, I would be grateful if you could clarify the following points regarding LINCS data:
- The README file mentions that you used data from Phase I and Phase II. What level (1-5)?
- Apart from cell line and compound information (perturbation ID and dose), did you consider any other variables when modeling this data?
- How did you compute the differentially expressed genes? Did you compute separate ranks for each cell line?
- In terms of evaluation, the baseline in the paper employs the expression decoded from the basal state (i.e. excluding perturbation information). Does this baseline preserve the ground-truth cell line information when decoding the basal state or is this information also removed?
- Did you consider computing the R2 scores between the raw control data (i.e. no autoencoder involved) and the ground-truth post-perturbation profiles predicted by chemCPA?
Any clarification on these points would be greatly appreciated.
@MxMstrmn I would appreciate your answers to the questions above
Hi @rvinas,
- We use level two (the GEX equivalent)
- No, for the LINCS data, we only considered compounds, dosage, and cell line information
- The differentially expressed genes we approximated by this part of the notebook in
1_lincs.py
, L93-L110 - The baseline is the composition of basal state + cell line information. Effectively, we simply check how similar to control distribution is compared to the perturbed state
- No, I did not make this check myself but relied on the original analysis in the Sci-Plex data