Orion
Orion copied to clipboard
[RE-5863] Resolve potential race condition in log fields
Description
In current version of logFields, we discovered that there is a potential risk of having racing condition when multiple goroutine try to access logFields under an identical context (ctx). As you may see from the report below, when running parallelism test on the current implementation, a race condition is emerged even with a very little sample rate. Therefore, in this PR, we are turning the old logFields into protected logFields by applying read/write lock. Now when logFields being accessed, it will first check if some other instance is also accessing it, if yes, it will wait until the access is freed up so no simultaneous write will happen. In effects, FromContext doesn't seems to have it's performance decreased, while AddToLogContext increased about 10% of it's process time, and it's all being said with race condition is no longer a concern.
Review Guideline
FromContext Applied with read lock as it's a public method, now we are return a copy of the map instead of giving access directly to logFields
fromContext Same implementation as before, but now it's only for private access
AddToLogContext Now write access is locked when writing new data to logFields
How to Test
Parallelism Test Previously
Reader 2 read from logfields map[k1:v1 k2:v2]
Reader 3 read from logfields map[k1:v1 k2:v2]
Reader 1 read from logfields map[k1:v1 k2:v2]
==================
WARNING: DATA RACE
Write at 0x00c00009f380 by goroutine 14:
runtime.mapassign_faststr()
/Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite.func1()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:61 +0xa4
Previous write at 0x00c00009f380 by goroutine 16:
runtime.mapassign_faststr()
/Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite.func1()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:61 +0xa4
Goroutine 14 (running) created at:
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:59 +0x204
testing.tRunner()
/Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
Goroutine 16 (running) created at:
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:59 +0x204
testing.tRunner()
/Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
==================
Writer 3 wrote key3:val8081
Writer 2 wrote key2:val1847
Writer 1 wrote key1:val7887
lf map[key1:val7887 key2:val1847 key3:val8081 test-key:test-value]
--- FAIL: TestParallelWrite (0.25s)
testing.go:1092: race detected during execution of test
==================
WARNING: DATA RACE
Write at 0x00c000196090 by goroutine 23:
runtime.mapassign_faststr()
/Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func2()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:93 +0xa4
Previous write at 0x00c000196090 by goroutine 22:
runtime.mapassign_faststr()
/Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func2()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:93 +0xa4
Goroutine 23 (running) created at:
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:91 +0x2c5
testing.tRunner()
/Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
Goroutine 22 (running) created at:
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:91 +0x2c5
testing.tRunner()
/Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
==================
Writer 3 wrote key3:val1318
Reader 3 read from logfields map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
==================
WARNING: DATA RACE
Read at 0x00c00019c098 by goroutine 20:
reflect.typedmemmove()
/Users/chiahewen/.goenv/versions/1.16.8/src/runtime/mbarrier.go:177 +0x0
reflect.copyVal()
/Users/chiahewen/.goenv/versions/1.16.8/src/reflect/value.go:1310 +0x7d
reflect.(*MapIter).Value()
/Users/chiahewen/.goenv/versions/1.16.8/src/reflect/value.go:1264 +0x186
internal/fmtsort.Sort()
/Users/chiahewen/.goenv/versions/1.16.8/src/internal/fmtsort/sort.go:65 +0x2d9
fmt.(*pp).printValue()
/Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:769 +0x14cf
fmt.(*pp).printArg()
/Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:712 +0x284
fmt.(*pp).doPrintf()
/Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:1026 +0x330
fmt.Fprintf()
/Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:204 +0x84
fmt.Printf()
/Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:213 +0x111
github.com/carousell/Orion/utils/log/loggers/test_test.readWorker()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:23 +0x67
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func1()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:85 +0x99
Previous write at 0x00c00019c098 by goroutine 22:
github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x132
github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func2()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:93 +0xa4
Goroutine 20 (running) created at:
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:83 +0x20e
testing.tRunner()
/Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
Goroutine 22 (running) created at:
github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
/Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:91 +0x2c5
testing.tRunner()
/Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
==================
Reader 2 read from logfields map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
Reader 1 read from logfields map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
Writer 1 wrote key1:val4059
Writer 2 wrote key2:val2081
map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
--- FAIL: TestParallelReadAndWrite (0.25s)
testing.go:1092: race detected during execution of test
FAIL
exit status 1
FAIL github.com/carousell/Orion/utils/log/loggers/test 1.136s
Now
Reader 9 read from logfields map[k1:v1 k2:v2]
Reader 2 read from logfields map[k1:v1 k2:v2]
...
Writer 47 wrote key47:val4078
Writer 48 wrote key48:val6159
lf map[key1:val1847 key10:val8511 key11:val3300 key12:val694 key13:val1211 key14:val8162 key15:val3237 key16:val495 key17:val9947 key18:val4728 key19:val3274 key2:val7887 key20:val5089 key21:val1445 key22:val9106 key23:val1528 key24:val5466 key25:val6258 key26:val8047 key27:val8287 key28:val2790 key29:val2888 key3:val1318 key30:val3015 key31:val5541 key32:val408 key33:val7387 key34:val6831 key35:val5429 key36:val1737 key37:val631 key38:val1485 key39:val5194 key4:val8081 key40:val5356 key41:val6413 key42:val3090 key43:val5026 key44:val4147 key45:val563 key46:val4324 key47:val4078 key48:val6159 key49:val2433 key5:val2081 key50:val1353 key6:val4059 key7:val2540 key8:val4425 key9:val456 test-key:test-value]
Reader 29 read from logfields map[test-key:test-value]
Reader 18 read from logfields map[test-key:test-value]
...
map[key1:val1957 key10:val9355 key11:val2451 key12:val4538 key13:val8266 key14:val156 key15:val2605 key16:val5561 key17:val7202 key18:val9828 key19:val4783 key2:val3721 key20:val4376 key21:val5746 key22:val1563 key23:val9002 key24:val9718 key25:val5447 key26:val5094 key27:val1577 key28:val7463 key29:val7996 key3:val3000 key30:val6420 key31:val8623 key32:val953 key33:val1137 key34:val3133 key35:val9241 key36:val59 key37:val3033 key38:val8643 key39:val3891 key4:val7189 key40:val2002 key41:val8878 key42:val9336 key43:val2546 key44:val9107 key45:val7940 key46:val6503 key47:val552 key48:val9843 key49:val2205 key5:val2199 key50:val1598 key6:val8705 key7:val9703 key8:val2888 key9:val8510 test-key:test-value]
PASS
ok github.com/carousell/Orion/utils/log/loggers/test 1.200s
Benchmark stat Previously
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkFromContext
BenchmarkFromContext-8 41110058 25.81 ns/op
BenchmarkFromAddToLogContext
BenchmarkFromAddToLogContext-8 2947244 409.4 ns/op
Now
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkFromContext
BenchmarkFromContext-8 39115351 25.86 ns/op
BenchmarkFromAddToLogContext
BenchmarkFromAddToLogContext-8 2701092 442.9 ns/op
Checklist
- [ ] Test on local
- [ ] Test on stage
- [ ] run
go mod tidy
andgo mod vendor
- [ ] change module's version back to master
Screenshots (Optional)
Related PRs (Optional)
Impacted Endpoints (Optional)
Codecov Report
Merging #185 (b9a446c) into master (dafc9bf) will decrease coverage by
0.51%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #185 +/- ##
==========================================
- Coverage 58.49% 57.97% -0.52%
==========================================
Files 13 13
Lines 559 564 +5
==========================================
Hits 327 327
- Misses 225 230 +5
Partials 7 7
Impacted Files | Coverage Δ | |
---|---|---|
utils/kafka/config.go | 0.00% <0.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
shouldn't you just change logfields to sync.Map instead ?
something along the lines of - https://github.com/go-coldbrew/log/pull/5/files
That is in fact a more concise way to dealing with it, thanks for the info and will look into it.