durable-queue icon indicating copy to clipboard operation
durable-queue copied to clipboard

When can a statistic be negative?

Open jaju opened this issue 7 years ago • 5 comments

[Question] Here's one I see in my process. "out" {:enqueued 8332, :retried 0, :completed 8333, :in-progress -1}

I'm unsure how to handle this, as I depend on the values to decide how to progress (completion marking etc.)

Thanks!

jaju avatar Aug 06 '16 18:08 jaju

That shouldn't be possible, it seems like a bug. Does this happen often?

ztellman avatar Aug 07 '16 00:08 ztellman

Yes, it does happen often.

Here's some detail of how I am using DQ. All actors working concurrently. Before, and after, the following block, there's only a single thread of execution, which pre-populates the first queue with tasks. [DQ::some-in-q] --multiple-consumers--> [DQ::some-out-q] --single-consumer-to-drain-writing-to-a-file-as-tasks-come.

Now that you say it is unexpected, I have a question - should "put!"s be protected with a lock? I've been having some troubles, so I introduced locking (around interval-task-seq calls) and saw some improvement - I'm sorry though, I didn't do a thorough job at investigating and recording. I'm going to try locking around the put!s too and re-run a few times.

If you'd like me to try any specific steps, do let me know.

jaju avatar Aug 07 '16 02:08 jaju

A few more observations.

  1. I stopped seeing those negative numbers once I introduced (locking ...) around my operations on DQ.
  2. With many concurrent writes, and fsync-put disabled, I started to see garbled content when consuming from the queue being written to. I have always thought of fsync as protection against program crash, not about reliable writing to the queues. I introduced calling fsync (with the surrounding locking) for every write to this queue which is concurrently written to by 10s of threads. It looks like the problem of garbled content is now gone. BUT I'd need to look for some more time.
    • The garbled content problem appeared when I pushed up the writing threads to 100. But by introducing fsync, and locking, I have also (unintentionally) introduced threads-idling as they wait to write before continuing. So, I can't tell with confidence if the locking+fsync combo solves the problem, or just makes it hard for the problem to surface. From < 10 minutes, the run-time for this dataset I am working with grew to ~30 minutes after this change.

Note: The data I am processing is being retrieved over the net (HTTP calls), and I store the raw responses in another file. For the garbled content, I did cross-check the HTTP responses and the data in the raw form appeared to be right. That's my reference point.

jaju avatar Aug 10 '16 02:08 jaju

Well, in theory locking shouldn't be necessary on your end. I'll take a look.

ztellman avatar Aug 10 '16 02:08 ztellman

Could be related to https://github.com/Factual/durable-queue/issues/16, I observed the same things.

mpenet avatar Jul 10 '17 13:07 mpenet