gobblin icon indicating copy to clipboard operation
gobblin copied to clipboard

change TaskDone notification for forks to be not racy

Open Kaiserchen opened this issue 9 years ago • 9 comments

Kaiserchen avatar Mar 14 '16 13:03 Kaiserchen

It's ok to remove the timeout on getting a record from the queue which is no longer needed, but I think there should still be a timeout on putting a record into the queue. The benefit of having a timeout for put is that, if there are multiple forks in a task, and one fork is slow (hence its queue is often full), the task can timeout the put for this fork and go to the next fork.

Otherwise the entire task is blocked by the slow fork.

zliu41 avatar Mar 15 '16 00:03 zliu41

@zliu41 the entire task will be blocked by a slow fork at the moment anyhow. Wich is wanted behaviour i guess. IMO the gain of the whole looping thing in Task,processRecord() is very little, it just shifts the place where they see nor record for long time from the position between n to n+1 to n-1 to n. This only has an impact when its the last record. In this case the fast forks will still be done faster then the problematic fork that caused the blocking. I am actually in favour of throwing this while (!allPutsSucceeded) { away, as layed out the benefit is not really there IMO.

Kaiserchen avatar Mar 15 '16 02:03 Kaiserchen

@Kaiserchen that's true. Then can you change the return type of putRecord to void.

zliu41 avatar Mar 15 '16 15:03 zliu41

Changing the queue with statistics to give the wanted behavior is a little more invasive. I also need to touch/ write some unittests then. Will take me a little longer

Kaiserchen avatar Mar 16 '16 13:03 Kaiserchen

@Kaiserchen I agree the while (!allPutsSucceeded) block in Task.processRecord. I believe the main benefit is that for a single record, a single slow Fork will not block other Forks from seeing the record. However, the next record will not be processed, until all Forks have accepted the record.

I'm also ok with removing this logic, it should simplify the code also.

Removing the timeouts from the BoundedBlockingRecordQueue also sounds reasonable.

sahilTakiar avatar Mar 18 '16 18:03 sahilTakiar

Did all the changes that seemed to be required, Junittest seem to deadlock. If someone else want to fix this. Highly appreaciated.

Kaiserchen avatar Apr 13 '16 09:04 Kaiserchen

@shirshanka Can you please review this, since this might be a possible change that may impact streaming?

abti avatar Jan 12 '17 03:01 abti

Hi,

Sorry yall, not sure if i have time to pull it over the finish line. I can maybe chop of a few hours next week. If not Ill let you know. My company is only now migrating towards gobblin found this back then while evaluating.

Btw gobblin streaming? Really? :Sad_face:

Am 12.01.2017 04:16 schrieb "Abhishek Tiwari" [email protected]:

@shirshanka https://github.com/shirshanka Can you please review this, since this might be a possible change that may impact streaming?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/linkedin/gobblin/pull/841#issuecomment-272065072, or mute the thread https://github.com/notifications/unsubscribe-auth/AAO-YFRTsd0_Ui0BhCKJfLGUDjKevWW_ks5rRZsNgaJpZM4HwCNK .

Kaiserchen avatar Jan 12 '17 05:01 Kaiserchen

https://issues.apache.org/jira/browse/GOBBLIN-121

Please update your PR title with following prefix: [GOBBLIN-121]

abti avatar Jul 27 '17 18:07 abti