DECRES icon indicating copy to clipboard operation
DECRES copied to clipboard

How do the data files line up?

Open jmschrei opened this issue 6 years ago • 3 comments

Howdy!

Thanks for putting all of your data online.

I can see that the regions file contains all of the regions in the data file, but does the first region correspond to the first sample in the data file? It looks like the regions file is ordered by genomic coordinate, but that the label file is also grouped by label type. Is it the case that the label file and the data file are aligned, but that the regions aren't similarly aligned? If so, how can I align the regions to the data points and corresponding labels?

Thanks!

jmschrei avatar Jun 05 '18 03:06 jmschrei

I think I figured it out. I can use the {celltype}_Regions.bed file to line everything up.

I have another question about that, though. In Additional File 1 of "Genome-wide prediction of cis-regulatory regions using supervised deep learning methods" it says that there are 235k regions considered in K562, but the K562_Regions.bed file has only 18894 regions. Is this the data from that paper?

Thanks!

jmschrei avatar Jun 06 '18 06:06 jmschrei

The complete data is too big to share on GitHub. If you need the complete data, I will try to send a link to you. Yifeng

On Wed, Jun 6, 2018 at 2:38 AM, Jacob Schreiber [email protected] wrote:

I think I figured it out. I can use the {celltype}_Regions.bed file to line everything up.

I have another question about that, though. In Additional File 1 of "Genome-wide prediction of cis-regulatory regions using supervised deep learning methods" it says that there are 235k regions considered in K562, but the K562_Regions.bed file has only 18894 regions. Is this the data from that paper?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yifeng-li/DECRES/issues/3#issuecomment-394957189, or mute the thread https://github.com/notifications/unsubscribe-auth/AGynzsC8kMPH_0AMGcoFSaStze3RfcO6ks5t53j-gaJpZM4UaGPW .

yifeng-li avatar Jun 06 '18 13:06 yifeng-li

The full data set would be great. All I need is the region file and the 200 bp data file. Also to confirm, is the data in the data files lined up with the regions in the region file?

jmschrei avatar Jun 06 '18 17:06 jmschrei