bitcannon icon indicating copy to clipboard operation
bitcannon copied to clipboard

Progress when importing dump

Open Tailszefox opened this issue 9 years ago • 10 comments

When importing a torrent dump, this is what's outputted:

[~/bitcannon]$ ./bitcannon_linux_amd64 ./dump.txt.gz                                                                                                                                                                                      
[OK!] Connecting to Mongo at 127.0.0.1
[OK!] Attempting to parse ./dump.txt.gz
[OK!] File opened
[OK!] Extension is valid
[OK!] GZip detected, unzipping enabled
[OK!] Reading initialized
[OK!] Reading completed
      X torrents imported
      Y torrents skipped

It's not an issue for small dumps, but when they're larger, there's no progress indication. I'd be nice if BitCannon could display a progress indicator, like this: [OK!] Added torrents: 100/35000 or [OK!] Added torrents: 23% (ETA: 03:15)

It's not indispensable of course, but I think it'd be good to know.

Tailszefox avatar Jan 25 '15 13:01 Tailszefox

I've been thinking about this for a while and I wanted to use the go loading bar called pb, but the way I am currently reading files with a scanner, I don't think I can get my progress in the files (how many bytes it has read so far) without totalling up the bytes as it goes along, and even then it would be way off because that would total up the progress in uncompressed bytes and I can only get the size of the compressed gz file.

I really want to add this though, and I hope I'll be able to find some method to do this. I think I would need to find a way to get the size of a gz without uncompressing it, because it reads it while its uncompressing, so I'm pretty sure it only knows the uncompressed size after it's done.

stephen304 avatar Jan 25 '15 16:01 stephen304

Makes sense, yep. It probably would be easier to add it for uncompressed dumps first, but most people are going to use compressed ones, so...Well, anyway, glad to hear it's being considered, thanks!

Tailszefox avatar Jan 25 '15 16:01 Tailszefox

Just importing some dailies and thought of this too..

glad to hear it's being considered, thanks!

:D

ohhdemgirls avatar Feb 05 '15 23:02 ohhdemgirls

Of course! Unfortunately I still have to finish some core functions like auto scraping and perhaps category management, but this should be about next.

stephen304 avatar Feb 06 '15 02:02 stephen304

if you're brave, a public copy of BitCannon

Do you know of any public instances yet? (I'm interested in starting one) I only just got around reading your thread on /r/DataHoarder (used to mod there, started the irc) you should stop by! We're #DataHoarder on Freenode

ohhdemgirls avatar Feb 06 '15 10:02 ohhdemgirls

I think I saw someone put an ip of a public instance in the comments of some article about it, but I think he just wanted people to be able to see what the ui was like. I'm not aware of any public instance that anybody's put out to be used, and I think it would be great if you wanted to make one!

In the interest of making it production ready, I have tried to make all the changes I needed to the database as early as possible, so I hope clearing the database between releases won't be necessary (I haven't changed anything so far, and I'm getting close to the next release).

Let me know if you do make a public instance, perhaps I can put a link somewhere!

stephen304 avatar Feb 06 '15 21:02 stephen304

Let me know if you do make a public instance, perhaps I can put a link somewhere!

Still playing around but I've spoken to a few people about hosting it, just researching site apis and finding what's available while you bash away making this project ever more sexy!

Tip: I would advise against downloading full backups as files larger than 500mb can take half an hour or more to import even on fast computers

Improving performance is a current issue that is being worked on

What is the limiting factor when importing these files?

ohhdemgirls avatar Feb 07 '15 13:02 ohhdemgirls

Importing is slow and searches have been slowing down because of the addition of sorting by seeders. My benchmarks are as follows: Importing: 1,000 torrents per second from a gz file - 1hr 20mins for 6.3 million Page loads: Searches and the browse categories run in roughly 1 second, everything else is instant Scraping: Currently at 100 torrents per second - ~12 hours for 6 million

My hardware is as follows: AMD FX-4300 quad at 4 GHz Apple 120GB SSD 32GB DDR3 Ram Network: 110Mbit/s or ~12MB/s

It could vary depending on specs and how many torrents you import - After scraping the 6 million torrents from kickass, I found that 75% of the torrents have 0 seeders, so if you were only interested in active torrents, speeds could increase a lot.

stephen304 avatar Feb 07 '15 20:02 stephen304

I found that 75% of the torrents have 0 seeders, so if you were only interested in active torrents, speeds could increase a lot.

That would make more sense for a hosted instance I think, hmm. But locally I just want to get all the torrents.

ohhdemgirls avatar Feb 10 '15 18:02 ohhdemgirls

I can definitely do this with: http://stackoverflow.com/questions/24562942/golang-how-do-i-determine-the-number-of-lines-in-a-file-efficiently

1.5 gigs in 5-10 seconds, should be a very bearable pause at the beginning of the import, then I should be able to show a progress bar.

stephen304 avatar Feb 16 '15 14:02 stephen304