bitcannon
bitcannon copied to clipboard
Progress when importing dump
When importing a torrent dump, this is what's outputted:
[~/bitcannon]$ ./bitcannon_linux_amd64 ./dump.txt.gz
[OK!] Connecting to Mongo at 127.0.0.1
[OK!] Attempting to parse ./dump.txt.gz
[OK!] File opened
[OK!] Extension is valid
[OK!] GZip detected, unzipping enabled
[OK!] Reading initialized
[OK!] Reading completed
X torrents imported
Y torrents skipped
It's not an issue for small dumps, but when they're larger, there's no progress indication. I'd be nice if BitCannon could display a progress indicator, like this:
[OK!] Added torrents: 100/35000
or
[OK!] Added torrents: 23% (ETA: 03:15)
It's not indispensable of course, but I think it'd be good to know.
I've been thinking about this for a while and I wanted to use the go loading bar called pb, but the way I am currently reading files with a scanner, I don't think I can get my progress in the files (how many bytes it has read so far) without totalling up the bytes as it goes along, and even then it would be way off because that would total up the progress in uncompressed bytes and I can only get the size of the compressed gz file.
I really want to add this though, and I hope I'll be able to find some method to do this. I think I would need to find a way to get the size of a gz without uncompressing it, because it reads it while its uncompressing, so I'm pretty sure it only knows the uncompressed size after it's done.
Makes sense, yep. It probably would be easier to add it for uncompressed dumps first, but most people are going to use compressed ones, so...Well, anyway, glad to hear it's being considered, thanks!
Just importing some dailies and thought of this too..
glad to hear it's being considered, thanks!
:D
Of course! Unfortunately I still have to finish some core functions like auto scraping and perhaps category management, but this should be about next.
if you're brave, a public copy of BitCannon
Do you know of any public instances yet? (I'm interested in starting one) I only just got around reading your thread on /r/DataHoarder (used to mod there, started the irc) you should stop by! We're #DataHoarder on Freenode
I think I saw someone put an ip of a public instance in the comments of some article about it, but I think he just wanted people to be able to see what the ui was like. I'm not aware of any public instance that anybody's put out to be used, and I think it would be great if you wanted to make one!
In the interest of making it production ready, I have tried to make all the changes I needed to the database as early as possible, so I hope clearing the database between releases won't be necessary (I haven't changed anything so far, and I'm getting close to the next release).
Let me know if you do make a public instance, perhaps I can put a link somewhere!
Let me know if you do make a public instance, perhaps I can put a link somewhere!
Still playing around but I've spoken to a few people about hosting it, just researching site apis and finding what's available while you bash away making this project ever more sexy!
Tip: I would advise against downloading full backups as files larger than 500mb can take half an hour or more to import even on fast computers
Improving performance is a current issue that is being worked on
What is the limiting factor when importing these files?
Importing is slow and searches have been slowing down because of the addition of sorting by seeders. My benchmarks are as follows: Importing: 1,000 torrents per second from a gz file - 1hr 20mins for 6.3 million Page loads: Searches and the browse categories run in roughly 1 second, everything else is instant Scraping: Currently at 100 torrents per second - ~12 hours for 6 million
My hardware is as follows: AMD FX-4300 quad at 4 GHz Apple 120GB SSD 32GB DDR3 Ram Network: 110Mbit/s or ~12MB/s
It could vary depending on specs and how many torrents you import - After scraping the 6 million torrents from kickass, I found that 75% of the torrents have 0 seeders, so if you were only interested in active torrents, speeds could increase a lot.
I found that 75% of the torrents have 0 seeders, so if you were only interested in active torrents, speeds could increase a lot.
That would make more sense for a hosted instance I think, hmm. But locally I just want to get all the torrents.
I can definitely do this with: http://stackoverflow.com/questions/24562942/golang-how-do-i-determine-the-number-of-lines-in-a-file-efficiently
1.5 gigs in 5-10 seconds, should be a very bearable pause at the beginning of the import, then I should be able to show a progress bar.