PopLDdecay icon indicating copy to clipboard operation
PopLDdecay copied to clipboard

RAM Usage and other

Open GabrieleNocchi opened this issue 5 years ago • 2 comments

From my first tries, it seems that what you save in storage (compared to other software) you spend it in RAM.

Do you have some tests that show RAM usage by this tool? Running it on a 71 GB vcf file, 360 individuals and about 7,000,000 SNPs, it can't be run with 8GB of RAM, jobs get killed because it goes out of RAM. Now trying with 64. Any suggestion?

Also, I had a look at your paper and on Table 1 you are saying that Plink can't do this on vcf files directly. This is not true. There is the function --r2 in plink that does exactly this, and then you can plot it in R. Plink also has the function --blocks which let you calculated the LD block size distribution of your dataset. and YES, plink can be run directly on vcf files for a good few years now (flag --vcf)

Finally, sorry for posting here even though it is not a bug. I tried to join your QQ group and downloaded the app. But it is not ideal for Europeans as the app is not in English and I am not able to create an account as I do not understand it.

GabrieleNocchi avatar Nov 11 '19 15:11 GabrieleNocchi

Just reporting my results. With 64 GB of RAM it worked. Checking RAM usage I have seen that it used slightly less than 10GB of RAM to do the job on my 71GB vcf file (360 individuals, 7,000,000 SNPs).

That is actually quite good and also the file generated are very small (as advertised). I like this tool.

As I pointed out before Plink can run directly on vcf files and do this. The real advantage of this tool compared to plink in my personal is experience is that:

In plink, if you use the r2 function to calculate linkage between all SNPs pair on a big and dense vcf file like mine, even if you constrained to calculate LD up to a max distance of 300kb, you will still produce a huge output file with all the pairwise distances and LD. This becomes very tedious to plot in R, because loading huge files in R is not ideal. A workaround this when using Plink is to thin your vcf, leaving only like one SNP every 500 bases or so. But doing this, of course is a random thinning and you loose information. With this tool, PopLDDecay, you can run it straight on a dense vcf without the need on thinning and without having to make bins on the output to then be able to plot, which saves a lot of time and efforts.

  1. Also the plot looks quite good.

GabrieleNocchi avatar Nov 12 '19 11:11 GabrieleNocchi

Thank you very much for your recognition of this tools.

sorry we have not test a 71 GB vcf file, 360 individuals and about 7,000,000 SNPs on 8GB of RAM computure.

Yes, the new version plink can read vcf. At that time, we used the old plink and do not it can read vcf directly . Now we are very sorry for the article is misleading to sameone.

QQ group is mainly for the communication of bioinformatics personnel in China, and it is really not friendly to Europe, sorry for you. if you have any question, you can email to me

Your satisfaction is the motivation for me to write these tools and software, thanks a lot.

hewm2008 avatar Nov 15 '19 02:11 hewm2008