danker icon indicating copy to clipboard operation
danker copied to clipboard

Re-use links bziped dump file

Open Benja1972 opened this issue 3 years ago • 5 comments

Hello, Thank you for nice tool. I have one question about how to run danker on links file which already downloaded and processed by danker?

I run "./danker.sh ALL --bigmem" and after few hours it was crushed with memory issue but bziped file of links were created. How I can reuse this file to calculate only PageRank?

Thank you! Sergei

Benja1972 avatar Apr 07 '21 06:04 Benja1972

Hi Sergei,

Thanks a lot for your question! So the best option would be to run the following:

 bunzip2 filename.links.bz2
 python3 -m danker filename.links 0.85 40 0.1 -i | sed "s/\(.*\)/Q\1/" > output.rank
 sort -k 2,2nr -T . -S 50% -o output.rank output.rank

You would need to make sure that the machine has enough main memory available this time.

An alternative would be the following:

 bunzip2 filename.links.bz2
 sort -k 2,2n -T . -S 50% -o filename.links.right filename.links
 python3 -m danker filename.links -r filename.links.right 0.85 40 0.1 -i | sed "s/\(.*\)/Q\1/" > output.rank
 sort -k 2,2nr -T . -S 50% -o output.rank output.rank

This takes a bit longer but the memory footprint should be less than 8GB.

Let me know which option worked for you!

athalhammer avatar Apr 07 '21 13:04 athalhammer

Hi Andreas, Thank you for your answer. I will try it out. Would be nice to have a predefined bash script which does same for any output of link collector just in case .

Best Sergei

Benja1972 avatar Apr 07 '21 14:04 Benja1972

Hmm, let me think on the best option how to separate this out form the workflow script... https://github.com/athalhammer/danker/blob/20cc2b7b1fe5d937ea5204d214a074baf3400c93/script/dank.sh#L106

athalhammer avatar Apr 08 '21 11:04 athalhammer

Thank you @athalhammer ! It works for me. I have tested codes of lines you provide early. Sergei

Benja1972 avatar Apr 12 '21 10:04 Benja1972

So I thought about it and I came to the conclusion that wrapping these three lines in a designated script would be overkill.

athalhammer avatar Apr 30 '21 15:04 athalhammer