dat-daemon icon indicating copy to clipboard operation
dat-daemon copied to clipboard

Cannot add many dat files

Open mitar opened this issue 7 years ago • 10 comments

I get the following exception:

Error: EMFILE: too many open files, uv_interface_addresses
    at Object.networkInterfaces (os.js:126:30)
    at allInterfaces (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:163:21)
    at Timeout.that.update [as _onTimeout] (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:123:63)
    at ontimeout (timers.js:475:11)
    at tryOnTimeout (timers.js:310:5)
    at Timer.listOnTimeout (timers.js:270:5)

mitar avatar Feb 21 '18 09:02 mitar

Can you give me the steps to reproduce this issue?

soyuka avatar Feb 21 '18 13:02 soyuka

I have a directory with 1000 dat repositories. I have run the following script:

#!/usr/bin/env python

import json
import os
import subprocess
import sys

for dirpath, dirnames, filenames in os.walk('.', followlinks=True):
    if 'dat.json' not in filenames:
        continue

    directory_path = os.path.abspath(dirpath)
    dat_json_path = os.path.join(directory_path, 'dat.json')

    print("Adding '{dat_json_path}'.".format(dat_json_path=dat_json_path))

    with open(dat_json_path, 'r') as dat_json_file:
        dat_json = json.load(dat_json_file)

    subprocess.run(['datdaemon', 'add', dat_json['url'], directory_path], encoding='utf8', stdout=sys.stdout, stderr=sys.stderr)
    sys.stdout.flush()
    sys.stderr.flush()

And after running few of those, I got this error. Dat files were just created, but no files were imported yet.

mitar avatar Feb 21 '18 16:02 mitar

Really interesting use case.

I won't have the time to dig into this right now but from a first look it looks like we're spamming the daemon client with dns requests leading to that issue.

For now, try to wait for the datdaemon add command to respond before issuing a new request. It may be a good feature to add some sort of "bulk insert" in the client though! Maybe in the future!

soyuka avatar Feb 21 '18 16:02 soyuka

Yes, it seems if I am adding files few by few it works. (Does not die.)

I have some other issues, but I will open separate issues for that.

mitar avatar Feb 21 '18 17:02 mitar

Even with one second in between adding it still dies. (And I am waiting for datdaemon add to finish as a process before adding another file.)

mitar avatar Feb 21 '18 17:02 mitar

Now I added some slowly, but I cannot run datdaemond anymore. It dies when started with:

os.js:126
  const interfaceAddresses = getInterfaceAddresses();
                             ^

Error: EMFILE: too many open files, uv_interface_addresses
    at Object.networkInterfaces (os.js:126:30)
    at allInterfaces (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:163:21)
    at Timeout.that.update [as _onTimeout] (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:123:63)
    at ontimeout (timers.js:475:11)
    at tryOnTimeout (timers.js:310:5)
    at Timer.listOnTimeout (timers.js:270:5)

mitar avatar Feb 21 '18 19:02 mitar

I think the issue is simply that for every dataset, 8 files are opened (4 for metadata, 4 for content). And this piles up.

mitar avatar Feb 25 '18 23:02 mitar

I need to investigate this, will do when I have some spare time ;)

soyuka avatar Feb 26 '18 10:02 soyuka

I was looking around a bit how to address this, maybe node cluster could help. This could also then use all the cores on the system. So you could increase both the CPU limit and also share that each worker opens a subset of files to not hit the limit.

mitar avatar Feb 26 '18 19:02 mitar

With only 2 dats the daemon was taking 1G memory on my server. Might be because of opened file descriptors, not sure. Anyway this definitely needs to be improved but I'm not sure it's fits my scope.

soyuka avatar May 02 '18 08:05 soyuka