AIL-framework icon indicating copy to clipboard operation
AIL-framework copied to clipboard

Feeding maxing out at 150 pastes per minute when importing

Open src7 opened this issue 5 years ago • 13 comments

Hello,

why can't we feed more than 150 pastes per minute ? It doesn't look like a hardware bottleneck.

src7 avatar Feb 23 '20 22:02 src7

I am feeding more then 150 pastes a minute....

On Sun, Feb 23, 2020 at 5:41 PM src7 [email protected] wrote:

Hello,

why can not we feed more than 150 pastes per minute ? It does not look like a hardware bottleneck.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CIRCL/AIL-framework/issues/477?email_source=notifications&email_token=AAZXPEOPN6IUMVOFG6R5JDTREL3QVA5CNFSM4KZ6OZ52YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPS7GQQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXPENM3IDRQXIB2HM5TELREL3QVANCNFSM4KZ6OZ5Q .

v1psta avatar Feb 23 '20 22:02 v1psta

Thank you, we tried on different servers and Processed pastes is stuck at about 150 (on one single feeder, using bin/import_dir.py)

src7 avatar Feb 23 '20 22:02 src7

I will reach out to my team, and see if they had to make any special changes.

On Sun, Feb 23, 2020 at 5:49 PM src7 [email protected] wrote:

Thank you, we tried on different servers and Processed pastes is stuck at about 150 (on one single feeder, using bin/import_dir.py)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CIRCL/AIL-framework/issues/477?email_source=notifications&email_token=AAZXPEJVSRMAAO6NITYCNH3REL4PLA5CNFSM4KZ6OZ52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWJWMY#issuecomment-590125875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXPEIGQPGO2SYMLR5QNULREL4PLANCNFSM4KZ6OZ5Q .

v1psta avatar Feb 23 '20 22:02 v1psta

Thank you, currently I am trying something with configs/6382.conf

src7 avatar Feb 23 '20 23:02 src7

Hi @src7 ! Do you have any stuck queues ? Are you processing huge files ?

It might be a disk issue. We create the Global module to save all items on disk. Can you try to launch multiple Global modules ?:

Screen -r Script_AIL

Crtl+a c

. ./AILENV/bin/activate
cd bin
./Global.py

Terrtia avatar Feb 25 '20 09:02 Terrtia

Hi,

Queues are not empty, but not stuck. Files processed are regular pastes from pastebin.

The disk is NVME or RAM based (direct I/O disabled for the last one).

I will try these commands.

The weird thing is that it is the same constant rate all the time. Could it be one sleep somewhere ? (didn't found it yet)

We compile the last version of AIL directly from the master branch on a fresh install of Ubuntu Bionic.

src7 avatar Feb 25 '20 11:02 src7

Imported pastes are not compressed (if it can help)

src7 avatar Feb 25 '20 11:02 src7

@src7 Are you using the import_dir.py script by any chance? If yes, you might want to change the sleep time to something smaller. Look at this: https://github.com/CIRCL/AIL-framework/blob/5ae22ec2168a55e89279228de7c4cdbbd36baa44/bin/import_dir.py#L114

mokaddem avatar Feb 26 '20 09:02 mokaddem

Yes, and that is not the problem. I see way more than 150 pastes per minute imported in the console (no sleep set).

src7 avatar Feb 26 '20 10:02 src7

  1. In which console do you see that more than 150? Global, Mixer or import_dir? By default args.seconds`` is set to 0.2s
  2. And where do you see the 150 pastes per minute?

mokaddem avatar Feb 26 '20 13:02 mokaddem

  1. import_dir (also tested with different sleep values and even without the sleep line)
  2. In the Web Dashboard, Processed pastes

src7 avatar Feb 26 '20 13:02 src7

After some research I have found that tuning this delay changes everything :

https://github.com/CIRCL/AIL-framework/blob/b4a85c0e9808c1660adc36400e33238121099f4d/bin/Helper.py#L105

Currently reaching 4000 pastes per minute and working on more.

src7 avatar Feb 26 '20 20:02 src7

Good catch ! I did some modification on the ZMQ feeders. 998f8cc8e15f81dff5a4d006509d5c58883da629 The feeder use ZMQ Poller for more general non-blocking I/O.

You should reach more than 4000 pastes per minute

Terrtia avatar Feb 27 '20 12:02 Terrtia