DeepChrome icon indicating copy to clipboard operation
DeepChrome copied to clipboard

How to get the predictions for each gene?

Open dgarrimar opened this issue 2 years ago • 9 comments

Hi,

I ran the pipeline on my data smoothly, and got the ROC AUC in the train and test sets. However, I am not very familiar with torch/lua. How could I obtain the final predictions for each gene in the test set (either the 0/1 label or better the probablity [0,1])?. I guess this means just adding/modifying a couple of lines of code.

thanks!

PS. I'd be great too if I could obtain the accuracy/confusion matrices for the test set (not only the ROC AUC)

dgarrimar avatar Oct 06 '22 11:10 dgarrimar

The unnormalized outputs will be in the output variable here

You can append a nn.SoftMax module to the model in order to get normalized probabilities.

btw - have you tried the AttentiveChrome pytorch code in the repository? It's likely much easier to follow.

jacklanchantin avatar Oct 06 '22 14:10 jacklanchantin

I am still not sure on how to do this in lua, you mean something like: local ex = nn.SoftMax(output) and then print ex to a file? I am not familiar with lua objects. Which kind of object is output ? It seems it is not just a number. The same for ex. Could you please give me some more hints on how to store the actual numbers in a file? Thanks a lot! I also had a look at the pytorch code, but it seems to be much slower on the same dataset (I am using CPUs for now).

dgarrimar avatar Oct 06 '22 17:10 dgarrimar

you can do normalized_output = nn.SoftMax()(output)

output is a torch tensor normalized_output[:,0] = p(x=true) normalized_output[:,1] = p(x=false)

you can write each of these to a csv file using standard lua write to file methods.

jacklanchantin avatar Oct 06 '22 18:10 jacklanchantin

Uhm, for some reason it complains: unexpected symbol near ':'. (I just copy/pasted)

dgarrimar avatar Oct 06 '22 18:10 dgarrimar

I don't remember what dimension normalized_output would be. Can you try removing the :, ?

jacklanchantin avatar Oct 06 '22 18:10 jacklanchantin

same :( (')' expected near '=')

dgarrimar avatar Oct 06 '22 18:10 dgarrimar

oh you shouldn't use = p(x=true), i was explaining what those will give you - i.e. the probability that the input is has expression=true

jacklanchantin avatar Oct 06 '22 18:10 jacklanchantin

I see haha, but still normalized_output:nDimension() is 1

dgarrimar avatar Oct 06 '22 18:10 dgarrimar

OK I think I got it :), normalized_output[1] should give the probability

dgarrimar avatar Oct 06 '22 18:10 dgarrimar