proteomics-sample-metadata
proteomics-sample-metadata copied to clipboard
LFQBenchmark experiment - multiple organisms
Hi everyone,
I generated an updated LFQbenchmark dataset, similar to the one from Navarro et al. (https://pubmed.ncbi.nlm.nih.gov/27701404/). I was wondering how I could best annotate the mixtures (as pooled samples)? Can I mention more than one organism in the characteristics[organism] column? Additionally, would it be beneficial to add an additional comment section to define the ratio's of the three proteomes?
Looking forward to your suggestions!
Best,
Bart Van Puyvelde
Hi Bart, did you get any help with this?
I suspect you could use the field characteristics[pooled sample]
and list in it all the samples that are pooled (SN=sample 1,sample 2, … sample 9
were "sample n" is the value of the corresponding sample in source name
).
For the relative quantities I am not sure. Others may have better ideas. Maybe you could use the key QY=
to indicate relative quantity (like in characteristics[spiked compound]
), but I am not sure how to make the sample names correspond to the quantities.
Also, I don't know how to do if one of the pooled samples is not analysed alone (so there is no .raw file associated to one of the sample names).
Hi @mlocardpaulet @brvpuyve :
First, my apologies for the late reply, I was OFF for a couple of weeks. I was discussing a some weeks ago about with @anjaf about how to represent multiplexed samples in an experiment.
We have two options here:
1- Represent each sample as an independent sample, adding a characteristics to the sample called characteristics[concentration of]
and link each sample to the same data file. The characteristics[organism]
will be different for each sample. This is actually a clean representation because each sample has its own row and can be represented with more characteristics. It has differences with the current pooled
approach mentioned by @mlocardpaulet because in the pooled approach samples are used multiple times in their corresponding channel + in the pooled.
It will be something like:
source name | characteristics[organism] | characteristics[organism part] | characteristics[biological replicate] | characteristics[concentration of] | assay name | comment[technical replicate] | comment[fraction identifier] | comment[label] | comment[data file] | characteristics[concentration of] |
---|---|---|---|---|---|---|---|---|---|---|
Sample-1 | homo sapiens | heart | 1 | 70% | ms_run 1 | 1 | 1 | label free sample | file1.raw | 70% |
Sample-2 | e coli | liver | 1 | 60% | ms_run 1 | 1 | 1 | label free sample | file1.raw | 60% |
As you can see the assay name
is the same meaning that the file and the label conditions are the same.
2- @anjaf mentioned before the idea of having an characteristics[organism]
called mixed, then we can represent all the species in the sample in the characteristics[pooled sample]
as key values pairs with concentrations.
Would be great to have your opinion @anjaf @jgriss @mvaudel @mlocardpaulet @all @bigbio/collaborators
Hi @ypriverol thanks a lot. I like option 1- very much. So to be clear: there will be duplicated file names?
Option 1 is maybe the best approach although it will be some work for me to add the extra lines :-) Let me know what is decided and I will create the SDRF's.
Thanks for the comments!
Hi @ypriverol thanks a lot. I like option 1- very much. So to be clear: there will be duplicated file names?
Yes. We have the same case when multiple samples are multiplexed in the same RAW file.
I guess option one is fine if the python client can identify such a case?
- each
raw
file, if not unique is a mixture? - should concentration add to 100%? (to be valid?)
Hi all,
We already have this case covered in some sorts for isobarically labelled experiments (see PXD017799 as an example). Here, we also have mixtures of multiple, independent samples in one raw
file.
I therefore strongly suggest to stay consistent with the design approach that was chosen there, which essentially is what @ypriverol mentioned as option 1.
In case of isobarically labelled experiment, this could even be extended to have multiple rows referencing the same channel in the raw
file.
@enryH
- Personally, I think that
characteristics[concentration of]
should be optional, but if provided must add up to 100% to be valid - In isobarically labelled experiments we also refer to each
raw
file multiple times indicating that it's a mixture. But we might not always have / need f.e. the individual sample concentrations - just to keep this case in mind as well
Hello again, sorry it took me so long to come back to this.
I am looking at the headers that have been utilised in the SDRF generated to date and I see that characteristics[concentration of]
is used to define the concentration of compounds defined in characteristics[compound]
. So if we go with the option 1 (if I understood well: one row per sample in the pool, with the respective quantities annotated in characteristics[concentration of]
), can you distinguish the 2 usages of characteristics[concentration of]
?
Could this be an issue?
Hmm. If there is characteristics[organism]
and characteristics[compound]
then I guess it has to be ordered, but I am not 100% sure about this:
characteristics[organism] | characteristics[concentration of] | characteristics[compound] | characteristics[concentration of] |
---|
Could you explain the type of experiment where this is an issue?
But I agree that this could be an issue if it leads to ambiguous interpretations.
Hello,
I guess you are right, I cannot see an example where it would be used.