bitcannon icon indicating copy to clipboard operation
bitcannon copied to clipboard

Invalid categories appearing on 'browse' page

Open MightyRufo opened this issue 8 years ago • 62 comments

capture I have over 5 million torrents imported so far, and I wish to import another 20 million. I'm on a mission to import a full database of.. almost all torrents. Call me crazy! But I have got it working just fine. In fact, it's fast! But the browse page has broken for some reason. Any ideas?

MightyRufo avatar Oct 08 '15 05:10 MightyRufo

Oh... heh... I'd love to help fix it but I currently only have a chromebook for development which only has a measly celeron and 120GB of HD space. I'm not sure what could be causing this. Looks like some encoding issue, maybe go is freaking out for some reason and spitting random bytes. I don't really expect to have a resolution for this anytime soon.

stephen304 avatar Oct 09 '15 01:10 stephen304

Alright, thanks for your input. Overnight I decide to set up my own server and that to store torrents. The only thing I miss is the UI. Your design actually looks nice.

MightyRufo avatar Oct 09 '15 01:10 MightyRufo

I'm glad you like the design! If you have any ideas on the issue it might help me fix it. It's just a bit tedious without a QA or testing team. I also have lots of other pending tasks on bitcannon as well. Someday I'd like to use it to keep a torrent archive on a server in my closet or something, but I don't have the budget or the space and building a NAS for backups is my top priority. You can tell I'm a bit of a data hoarder, hence this project.

stephen304 avatar Oct 09 '15 01:10 stephen304

Me and you both, which is why I'm doing this is in the first place. Just in case any major torrent site goes down. I'll still have access to those torrents. I've been collecting for a while now, and I am just getting started ;)

MightyRufo avatar Oct 09 '15 01:10 MightyRufo

Can you check the database and make sure the torrent titles aren't actually corrupted? Try a program like robomongo or something and see if the data comes out okay. Some of my code is questionable because I wasn't sure if demand for an integrated database (sqlite) would force me to rewrite it all.

stephen304 avatar Oct 09 '15 01:10 stephen304

Already have that set up, but I haven't actually looked at that. I'm at work at the moment, but I get off in 2 hours. I'll look then and post back. Also, when I try to browse a specific category of torrents, it gives me API error. But cannon says '404 not found'.. Can this be caused by a incompatible dump that I imported?

MightyRufo avatar Oct 09 '15 01:10 MightyRufo

I'm not really sure. I remember the import system is a bit dumb and just makes assumptions about the format. I think it just checks for the correct number of fields and validates the btih.

stephen304 avatar Oct 09 '15 01:10 stephen304

And auto-correct is destroying my right now, forgive me.

MightyRufo avatar Oct 09 '15 01:10 MightyRufo

I see, any way I can send you some data to look at? I know that helps. Just tell me what you need. Looking at it yourself may help.

MightyRufo avatar Oct 09 '15 01:10 MightyRufo

Basically I need to know whether the data is corrupted or if bitcannon is making it all funky after reading perfectly normal data from the database. So if you can get an info hash that has a garbled title and look at the entry in robomongo then we can see if bitcannon is doing something really weird or not.

stephen304 avatar Oct 09 '15 01:10 stephen304

Alright, will do.

MightyRufo avatar Oct 09 '15 01:10 MightyRufo

Weird, I stated MongoDB and went to into Robomongo, but now MongoDB is running an index build. I have to wait. I'm not familiar with Robomongo, how do I get it to show information about what I select on the file tree?

MightyRufo avatar Oct 09 '15 04:10 MightyRufo

Alright, what am I looking for here? I see bitcannon and under that I see Sysem and Torrents. Double clicking on torrents gives me a tab, which then looks like info hashes. Viewing the list (it only shoes 50 different hashes?), I can expand each one of them and I can see the title. Titles seem to be intact.

MightyRufo avatar Oct 09 '15 04:10 MightyRufo

Ah, I just realized the part that you showed that was broken is the categories. Are the categories showing up fine as well in robomongo? What bitcannon does is try to get a unique list of categories, then lists those categories on browse. If you imported something with weird data in categories, then a bunch of new categories would get created, possibly flooding the categories page with junk. I think that might be what happened.

stephen304 avatar Oct 09 '15 21:10 stephen304

How do I view that in robomongo? I see the category section but it doesn't let me view it.

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

It should just be a field that you can view. I can't really remember exactly how it's stored but it should show something

On Sat, Oct 10, 2015, 12:10 AM MightyRufo [email protected] wrote:

How do I view that in robomongo? I see the category section but it doesn't let me view it.

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147034475 .

stephen304 avatar Oct 10 '15 04:10 stephen304

This is what I'm seeing: http://puu.sh/kEMUB/0e8fb6f8a2.png

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

Can you expand an entry with the arrow on the left?

On Sat, Oct 10, 2015, 12:24 AM MightyRufo [email protected] wrote:

This is what I'm seeing: http://puu.sh/kEMUB/0e8fb6f8a2.png

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147036547 .

stephen304 avatar Oct 10 '15 04:10 stephen304

Yeah, here: http://puu.sh/kEN4E/73123e549c.png

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

Do you get a pictures category on your browse page?

On Sat, Oct 10, 2015, 12:28 AM MightyRufo [email protected] wrote:

Yeah, here: http://puu.sh/kEN4E/73123e549c.png

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147036760 .

stephen304 avatar Oct 10 '15 04:10 stephen304

Seems so, yes: http://puu.sh/kENgf/5f532ca965.png

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

To me it looks like all the garbled categories have only 1 torrent in them. In that case it would mean you imported a bad file.

On Sat, Oct 10, 2015, 12:32 AM MightyRufo [email protected] wrote:

Seems so, yes: http://puu.sh/kENgf/5f532ca965.png

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147036889 .

stephen304 avatar Oct 10 '15 04:10 stephen304

Yup, I imported the openbay database LOL. I think I'll roll the database back and then avoid that dump. What are the 404 errors caused by? I mean obviously they mean 'not found'. But why might that be happening?

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

I'm not sure yet. I need sleep first

On Sat, Oct 10, 2015, 12:37 AM MightyRufo [email protected] wrote:

Yup, I imported the openbay database LOL. I think I'll roll the database back and then avoid that dump. What are the 40 errors caused by? I mean obviously they mean 'not found'. But why might that be happening?

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147036999 .

stephen304 avatar Oct 10 '15 04:10 stephen304

Alright, well. Thanks for the help so far.

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

Alright, so. I rolled back my database. I noticed there's a couple torrents that have invalid categories . How do I find them and correct them?

MightyRufo avatar Oct 10 '15 04:10 MightyRufo

If you click on them and get the btih and do a query with robomongo (Look up some guides) then you should be able to change the category in robomongo.

On Sat, Oct 10, 2015 at 12:50 AM MightyRufo [email protected] wrote:

Alright, so. I rolled back my database. I noticed there's a couple torrents that have invalid categories . How do I find them and correct them?

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147037385 .

stephen304 avatar Oct 10 '15 13:10 stephen304

Hmm, not so sure I can. Clicking on any of the bad categories just brings me to a page showing zero torrents, such as this: http://puu.sh/kG0C4/9bf4dbf8d4.png

MightyRufo avatar Oct 11 '15 05:10 MightyRufo

That's tough... The only 2 options I see are either we make code in bitcannon that checks the database on startup for weird categories and handles it somehow, or you can try to manually find the offending entries. You might be able to do a query for categories that are longer than 20 characters, but I'm a but rusty on mongo queries

stephen304 avatar Oct 11 '15 20:10 stephen304

I wouldn't code bitcannon to do that on start-up everytime. I'd code it so I can execute a command.

MightyRufo avatar Oct 11 '15 21:10 MightyRufo

Well if I remember correctly I set it up to query for all unique values for category then also query each category for a total count, so the information is almost there, I would need to implement the category whitelist so it can print out a warning or do something when there are weird categories.

I'd also be interested in seeing what the data for the offending torrents look like - I wonder if adding stricter btih hash validation would prevent it from happeining.

stephen304 avatar Oct 11 '15 21:10 stephen304

That would be nice. Last night I imported 7 million torrents. Out of those 7 million, about 10 are messing up the categories page. Otherwise, it works very well!

MightyRufo avatar Oct 11 '15 21:10 MightyRufo

The problem is that the import system is so dumb. I would love to have something where it prompts you to verify the first torrent's info before running the import, then saves the profile for each auto import source as some kind of format string.

stephen304 avatar Oct 11 '15 21:10 stephen304

That does sound pretty good. But it also takes time. And coding everything on your own takes even more time.

MightyRufo avatar Oct 11 '15 22:10 MightyRufo

Just got this from kat's hourly import. I'm guessing I'll have to turn off hourly updates for now. screen shot 2015-10-12 at 3 49 28 pm

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

Here's the mongo entry, I've removed it for now.

Here's the hourly dump that had the broken entry in it.

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

Are kat dumps fully importing? For me, it seems to skip most of them. And yes, this is what I have too.

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

@MightyRufo

2015/10/12 15:57:53 [OK!] Starting to import from url:
2015/10/12 15:57:53       https://kat.cr/api/get_dump/hourly/?userhash=USER_HASH_HERE
2015/10/12 15:57:55 [!!!] I was given a URL that doesn't end in .txt or .txt.gz.
2015/10/12 15:57:55       I'll assume it's regular text.
2015/10/12 15:57:55 [OK!] Compression detection complete
2015/10/12 15:57:55 [OK!] Reading initialized
2015/10/12 15:57:58 [OK!] Reading completed
2015/10/12 15:57:58       0 torrents imported
2015/10/12 15:57:58       3866 torrents skipped
2015/10/12 15:57:58 [OK!] Starting to import from url:
2015/10/12 15:57:58       http://www.demonoid.pw/api/demonoid24h.txt.gz
2015/10/12 15:57:59 [OK!] Compression detection complete
2015/10/12 15:57:59 [OK!] GZip detected, unzipping enabled
2015/10/12 15:57:59 [OK!] Reading initialized
2015/10/12 15:57:59 [OK!] Reading completed
2015/10/12 15:57:59       0 torrents imported
2015/10/12 15:57:59       199 torrents skipped
2015/10/12 15:57:59 [OK!] Starting to import from url:
2015/10/12 15:57:59       http://bitsnoop.com/api/latest_tz.php?t=all
2015/10/12 15:58:00 [!!!] I was given a URL that doesn't end in .txt or .txt.gz.
2015/10/12 15:58:00       I'll assume it's regular text.
2015/10/12 15:58:00 [OK!] Compression detection complete
2015/10/12 15:58:00 [OK!] Reading initialized
2015/10/12 15:58:00 [OK!] Reading completed
2015/10/12 15:58:00       9 torrents imported
2015/10/12 15:58:00       16 torrents skipped
2015/10/12 15:58:00 [OK!] Finished auto importing.

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

How did you find the entry and remove it?

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

It was a new install of bitcannon so I just opened Robomongo and looked through the results.

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

Ahh, I have over 7 million torrents imported, most of them work just fine. But I have a few pesky ones that I obviously cannot locate out of 7 million.

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

Maybe try searching for torrents with a category field longer than 10 - 20 chars? http://docs.mongodb.org/manual/reference/operator/query/size/

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

funny enough, executing the command in robomongo doesn't bring me any results when specifying any number for size. Either it's not working or I'm not doing it right. http://puu.sh/kHfnw/20f0ae47ed.png

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

Try this, may take a while. I've been running it on mine with 1.6M records and it's still going after 5 minutes.

db.getCollection('torrents').find({$where:"this.category.length > 20"})

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

You'll either get the broken records or something like this.

Fetched 0 record(s) in 41779ms

If you get this you need to lower the length by a few and try again.

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

Right, I never specified what data to look at lmao. One sec, let me run it.

EDIT: Yes, it's taking a bit, BUT it is running.

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

Brilliant, it just found 14 bad entries. I shall keep this command on hand. Thank you very much kind sir!

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

*ma'am

You're welcome. @Stephen304 maybe for now this could be added to the wiki under troubleshooting?

OmgImAlexis avatar Oct 12 '15 05:10 OmgImAlexis

Oh! My apologies ma'am. And yes, this should be added to the wiki. I'm sure it can come in handy for many people

MightyRufo avatar Oct 12 '15 05:10 MightyRufo

Thanks for the useful troubleshooting, I'll add it to the wiki even I get a chance. The weird thing is it doesn't look like the entry has an info hash. I'm not sure how that happened.

On Mon, Oct 12, 2015, 1:59 AM MightyRufo [email protected] wrote:

Oh! My apologies ma'am. And yes, this should be added to the wiki. I'm sure it can come in handy for many people

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147297900 .

stephen304 avatar Oct 12 '15 10:10 stephen304

@Stephen304 I've noticed the demonoid hourly update is also causing this, maybe it's something to do with it being gzipped? Maybe there's a library update or something that fixes this issue?

OmgImAlexis avatar Oct 12 '15 11:10 OmgImAlexis

I can try recompiling after reinstalling the go packages, but I won't be able to do that until probably tomorrow. Maybe I'll have time to start maintaining this again since it's getting traffic for some reason.

stephen304 avatar Oct 12 '15 15:10 stephen304

@Stephen304 this article was shared on HackerNews or Reddit, can't remember which one though.

OmgImAlexis avatar Oct 13 '15 00:10 OmgImAlexis

Ah, back in january it was on both because my friend decided to post it. It's nice to get more traffic and people to give feedback on this.

On Mon, Oct 12, 2015 at 8:16 PM X O [email protected] wrote:

@Stephen304 https://github.com/Stephen304 this https://torrentfreak.com/bitcannon-download-torrent-sites-to-use-offline-140118/ article was shared on HackerNews or Reddit, can't remember which one though.

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147554260 .

stephen304 avatar Oct 13 '15 00:10 stephen304

Also since we added support for it to SickRage there's now another 10k+ potential users. If this wasn't mainly in go I'd help, is there any chance you'd rewrite the go parts in node or are they in go for a reason?

OmgImAlexis avatar Oct 13 '15 00:10 OmgImAlexis

I could maybe get behind a node rewrite. I did it in go for mainly hipster reasons, it's new, has nice package management (node probably wins though), compiles naively to machine code, has nice scoping that I enjoy working with, has a nifty type system that I think makes coding easier. Node is also very good though and most of the heavy lifting is done in mongo so a node rewrite is an option.

On Mon, Oct 12, 2015, 8:41 PM X O [email protected] wrote:

Also since we added support for it to SickRage there's now another 10k+ potential users. If this wasn't mainly in go I'd help, is there any chance you'd rewrite the go parts in node or are they in go for a reason?

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147558521 .

stephen304 avatar Oct 13 '15 01:10 stephen304

Well if you could help even with getting most of it into node I would definitely be up to helping fix issues that come along and adding enhancements that people suggest.

OmgImAlexis avatar Oct 13 '15 03:10 OmgImAlexis

I might have some time to work on it this weekend. I'll probably make a node-bitcannon repo or something. One concern with a node version is that people will probably have to install node to run it. With Go I just ran gox and I got a binary for every platform that people can download. Idk if node has a thing for packaging node programs into a bundle executable + node environment. On the plus side, I'm pretty sure there's node gui things that run a web page in the window of the node app. Something like electron. So that would be good for usability.

On Mon, Oct 12, 2015 at 11:14 PM X O [email protected] wrote:

Well if you could help even with getting most of it into node I would definitely be up to helping fix issues that come along and adding enhancements that people suggest.

— Reply to this email directly or view it on GitHub https://github.com/Stephen304/bitcannon/issues/80#issuecomment-147583287 .

stephen304 avatar Oct 13 '15 03:10 stephen304

Like you said there are projects like electron that help with packaging it up, there's also nexe and a few others. There's also the option of just providing the source and forgetting about the binary, then people just install node and do something like this. That'll install nvm and then download and setup node v4.0.0, then clone the repo, install the npm dependencies and start the project with forever using the uid of bitcannon. To stop it they'd just run forever stop bitcannon.

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.29.0/install.sh | bash
nvm install v4.0.0
git clone https://github.com/Stephen304/bitcannon && cd bitcannon && npm install && npm install -g forever && forever --uid bitcannon start app.js

OmgImAlexis avatar Oct 13 '15 04:10 OmgImAlexis

@OmgImAlexis I've started a new issue for the rewrite discussion on #89

stephen304 avatar Oct 17 '15 00:10 stephen304

@stephen304 Hi! Is the project still alive? I'm trying to use it but a couple of bugs (including this one) makes it impossible.

Thanks!

Issam2204 avatar May 18 '16 11:05 Issam2204

Not really, I haven't had much time to put into this project and I don't have the kind of unmetered internet connection I used to have which made testing this easier.

Try making sure the data is clean and in the right format before importing.

On Wed, May 18, 2016, 7:10 AM Issam Chajii [email protected] wrote:

@stephen304 https://github.com/stephen304 Hi! Is the project still alive? I'm trying to use it but a couple of bugs (including this one) makes it impossible.

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/stephen304/bitcannon/issues/80#issuecomment-219995390

stephen304 avatar May 18 '16 11:05 stephen304