Evaluation of process inspector
We need to quantitatively evaluate the process inspector as well as the Ethernet inspector (FOSDEM presentation slide)
Tried to reproduce ZOOKEEPER-2212 with several configs.
All the experiments are done on my local lenovo pc. (Xeon E3-1220 v3 * 4, 8 GB RAM)
- Earthquake: a7defa0
- Kernel: 4.2.0-30-generic #36-Ubuntu
| EQ Config | #CPU assigned | #Exp | Reproducibility | #Pattern@1000 exp | Notes |
|---|---|---|---|---|---|
| None | 4 | 5,000 | 0% | 156 | Data is from FOSDEM slide. |
| Ether | 4 | 1,000 | 21.8% | 573 | Ditto. With latest EQ + 1 CPU, reproducibility grew to about 50%. |
| None | 1 | 1,000 | 0% | N/A | |
| None + SCHED_BATCH | 1 | 1,000 | 0% | N/A | |
Proc(mild{UseBatch:true})(SCHED_BATCH + random nice values) |
1 | 5,000 | 0.7% | 634 | 0.08% experiments failed due to timeout |
Proc(mild{UseBatch:true}) |
4 | 5,000 | 0.32% | 548 | No experiment failed due to timeout |
Proc(mild{UseBatch:false}) |
1 | 5,000 | 0.26% | 914 | 90% experiments failed due to timeout |
-
mild{UseBatch:true}provides better reproducibility thanmild{UseBatch:false}, but not so good as the Ethernet inspector. -
mild{UseBatch:false}provides better pattern growth, but not useful for ZOOKEEPER-2212 due to too many timeouts. - Proc(
extreme) likely to cause starvation on single CPU, so I did not experimented. - Proc(
dirichlet) hits the bug mentioned in README.
Also tested ZOOKEEPER-2137 with the latest ZooKeeper (just 50 times on 4 CPUs):
| EQ Config | #CPU assigned | #Exp | Reproducibility | #Pattern@1000 exp | Notes |
|---|---|---|---|---|---|
| None | 4 | 50 | 2% | N/A | - |
Proc(mild{UseBatch:true})(SCHED_BATCH + random nice values) |
4 | 50 | 16% | N/A | - |
Proc(mild{UseBatch:true}) |
1 | 50 | 2% | N/A | - |
This reproducibility is useful enough (on 4 CPUs).
The process inspector works well with ZOOKEEPER-2137, although not with 2212.
I guess this is because ZOOKEEPER-2137 runs longer (> 1 min) than 2212,
i.e., much more chances to work are given to sched_setattr().
I keep this issue ticket open for discussion.
PTAL @mitake
Evaluated some YARN (apache/hadoop@4e4b3a8465a8433e78e015cb1ce7e0dc1ebeb523 ) tests using osrg/earthquake@13aa33b371fc714608061f4671a83dd18d7b25fe (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned).
Tests are executed 100 times with/without Earthquake.
Note that this version of Earthquake does not contain an optimization (#146)
| Test | Reproducibility(without EQ) | Reproducibility(with EQ) |
|---|---|---|
| YARN-4548(RM/TestCapacityScheduler) | 11% | 82% |
| YARN-4556(RM/TestFifoScheduler | 2% | 44% |
| YARN-4168(NM/TestLogAggregationService) | 1% | 8% |
| YARN-1978(NM/TestLogAggregationService | 0% | 4% |
| YARN-4543(NM/TestNodeStatusUpdater) | 0% | 1% |
I found sometimes it is better to apply Namazu (formerly named Earthquake) for stress process rather than Hadoop mvn process.
Testcase: YARN-5043 (RM/TestAMRestart) (apache/hadoop@06413da72efed9a50e49efaf7110c220c88a7f4a
) using osrg/namazu@8e4f26836c4affa15a6bb5ade57f21bd9417354e (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned). Done 100 times.
Stress: stress --cpu 2
| Running stress? | Namazu applied for | Reproducibility |
|---|---|---|
| N | None | 16% |
| Y | None | 12% |
| N | mvn | 7% |
| Y | stress | 30% |
TODO:
- reevaluate other YARN tests with
stress - scientific, and reliable analysis
I'd like to report my experiment of etcd 5022: https://github.com/coreos/etcd/issues/5022
| w/ or w/o Namazu process inspector | Reproducibility |
|---|---|
| w/o | 0% |
| w/ | 2.7% |
Both of a number of test running in the above experiments is 1000.
Parameters of explorer policy:
explorePolicy = "random"
[explorePolicyParam]
procPolicy = "dirichlet"