dctk icon indicating copy to clipboard operation
dctk copied to clipboard

Save share on file rather in memory and allow share index update

Open drd33m opened this issue 5 years ago • 4 comments

Hi, I know I already have another issue open but I am just putting this here for when you get time.

I would like the files.xml.bz2 to be stored on disk in the directory that bot is running as its not viable to share and index larger shares everytime the bot starts up. Another feature also is to force a re-index for added files

drd33m avatar Feb 23 '20 07:02 drd33m

UPDATE: I thought I might explain this more since what I have is very vague.

When indexing large folders >40G indexing takes a long time + the memory usage used is not viable for very large indexes at 20G indexed the memory usage was already at 1.5G so there would be no way for people to index large folders without running out of memory.I am not to sure how to solve the indexing taking up large amounts of memory as I have little experience in Go.

But to solve having to reindex the share every time the bot starts, saving the filelist to the filelist.xml.bz2 format to disk would solve it. Loading this file into memory at startup (then call a ShareUpdate if it detects a filelist.xml.bz2) and after a share refresh would also allow us to not thrash the disk.

From what I see and understand in the code there is no way to refresh the filelist I only see ShareAdd and ShareDel. ShareAdd causes the whole filelist to be reindexed again not viable for large shares. I propose the addition of a ShareUpdate function which looks for file updates. Now this function can be called manually by the user allowing for them to have control over when share updates happen.

I know this is a large addition and might be a bit hard to follow. It is hard for me to convey it. But in general I would like to see the filelist functionality of normal DC clients added

UPDATE:

This could also allow you to run multiple bots and share a local files.xml.bz2 instead of every instance indexing there own

drd33m avatar Mar 25 '20 03:03 drd33m

Hello, i'll handle this feature when i'll have some free time again, but in the meantime, if you want to help, feel free to try to write a patch yourself.

From what i remember, file indexing can consume a lot of RAM for two reasons:

  • the file list, that can have a size of 100MB and over, and is kept in the RAM, but this isn't the bigger piece of the cake
  • the hash algorithm, that depends on the file size and requires 3 bytes (24 bits) in RAM for each 1024 bytes, so if you have a 100M file, the consumed RAM is 100 * 1024 * 1024 / 1024 * 3 = 307K

then there's the question of progressive share updates, a feature that isn't implemented at all.

So, i'd start with implementing progressive share updates, by writing a function ShareRescan() that

  • scans the share directories
  • detects changed files by size and edit time
  • recomputes their hashes and adds them to the list

aler9 avatar Mar 25 '20 10:03 aler9

Thankyou! I will try to give the ShareRescan() function ago but my knowledge the how TTH and the filelist structure is limited so no promises :).

One thing I forgot to mention before is I am running multiple client instances in go routines. Say if all 4 of them would rescan at the same time whats to stop them adding the same file twice? I thought maybe adding a option to specify the filelist name per client would also be handy. So that ShareRescan could be called via a channel and staggered

drd33m avatar Mar 25 '20 12:03 drd33m

Hey just bumping this as this is still a feature I would love to see in this since this is the only NMDC/ADC lib out there. Maybe ncdc sources might help out https://g.blicky.net/ncdc.git/tree/src I see you are a busy person. So I hope that you can find some room for this. 👍

The main idea have a files.xml.bz2 like all DC++ clients. I had rigged together the load of a files.xmlbz2 from ncdc and sending it was fine it was just sending the files in that list I stopped at. This tth stuff is well beyond my level of knowledge

drd33m avatar Aug 17 '20 19:08 drd33m