ldbc_snb_interactive_v1_driver icon indicating copy to clipboard operation
ldbc_snb_interactive_v1_driver copied to clipboard

Operation counts not consistent across benchmarks

Open xwkuang5 opened this issue 5 years ago • 4 comments

Hi,

I am reposting an open issue in the ldbc_snb_implementations repo here.

I am trying to use the cypher benchmark to evaluate the performance of Neo4j under different configurations. I set operation_count=2500 and run interactive-benchmark.sh script multiple times. However, I was getting three different final operations counts (2473, 2532, 2584) across 3 different runs. Is this the expected result?

Thanks for any help in advance!

Here is my configuration

endpoint=bolt://localhost:7687
user=neo4j
password=admin
queryDir=queries/
printQueryNames=false
printQueryStrings=false
printQueryResults=false

status=1
thread_count=2
name=LDBC-SNB
results_log=true
time_unit=MILLISECONDS
time_compression_ratio=0.001
peer_identifiers=
workload_statistics=false
spinner_wait_duration=1
help=false
ignore_scheduled_start_times=true

workload=com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload
db=com.ldbc.impls.workloads.ldbc.snb.cypher.interactive.CypherInteractiveDb
operation_count=2500
ldbc.snb.interactive.parameters_dir=../../ldbc_snb_datagen/substitution_parameters/
ldbc.snb.interactive.updates_dir=../../ldbc_snb_datagen/social_network/
ldbc.snb.interactive.short_read_dissipation=0.2
ldbc.snb.interactive.update_interleave=49274

warmup=100

## frequency of read queries (number of update queries per one read query)
ldbc.snb.interactive.LdbcQuery1_freq=26
ldbc.snb.interactive.LdbcQuery2_freq=37
ldbc.snb.interactive.LdbcQuery3_freq=123
ldbc.snb.interactive.LdbcQuery4_freq=36
ldbc.snb.interactive.LdbcQuery5_freq=78
ldbc.snb.interactive.LdbcQuery6_freq=434
ldbc.snb.interactive.LdbcQuery7_freq=38
ldbc.snb.interactive.LdbcQuery8_freq=5
ldbc.snb.interactive.LdbcQuery9_freq=527
ldbc.snb.interactive.LdbcQuery10_freq=40
ldbc.snb.interactive.LdbcQuery11_freq=22
ldbc.snb.interactive.LdbcQuery12_freq=44
ldbc.snb.interactive.LdbcQuery13_freq=19
ldbc.snb.interactive.LdbcQuery14_freq=49

# *** For debugging purposes ***

ldbc.snb.interactive.LdbcQuery1_enable=true
ldbc.snb.interactive.LdbcQuery2_enable=true
ldbc.snb.interactive.LdbcQuery3_enable=true
ldbc.snb.interactive.LdbcQuery4_enable=true
ldbc.snb.interactive.LdbcQuery5_enable=true
ldbc.snb.interactive.LdbcQuery6_enable=true
ldbc.snb.interactive.LdbcQuery7_enable=true
ldbc.snb.interactive.LdbcQuery8_enable=true
ldbc.snb.interactive.LdbcQuery9_enable=true
ldbc.snb.interactive.LdbcQuery10_enable=true
ldbc.snb.interactive.LdbcQuery11_enable=true
ldbc.snb.interactive.LdbcQuery12_enable=true
ldbc.snb.interactive.LdbcQuery13_enable=true
ldbc.snb.interactive.LdbcQuery14_enable=true

ldbc.snb.interactive.LdbcShortQuery1PersonProfile_enable=true
ldbc.snb.interactive.LdbcShortQuery2PersonPosts_enable=true
ldbc.snb.interactive.LdbcShortQuery3PersonFriends_enable=true
ldbc.snb.interactive.LdbcShortQuery4MessageContent_enable=true
ldbc.snb.interactive.LdbcShortQuery5MessageCreator_enable=true
ldbc.snb.interactive.LdbcShortQuery6MessageForum_enable=true
ldbc.snb.interactive.LdbcShortQuery7MessageReplies_enable=true

ldbc.snb.interactive.LdbcUpdate1AddPerson_enable=true
ldbc.snb.interactive.LdbcUpdate2AddPostLike_enable=true
ldbc.snb.interactive.LdbcUpdate3AddCommentLike_enable=true
ldbc.snb.interactive.LdbcUpdate4AddForum_enable=true
ldbc.snb.interactive.LdbcUpdate5AddForumMembership_enable=true
ldbc.snb.interactive.LdbcUpdate6AddPost_enable=true
ldbc.snb.interactive.LdbcUpdate7AddComment_enable=true
ldbc.snb.interactive.LdbcUpdate8AddFriendship_enable=true

xwkuang5 avatar Feb 21 '20 15:02 xwkuang5

If I understand short_read_dissipation correctly, it is the delta in the random walk model. Larger short_read_dissipation means a shorter walk, e.g., in the extreme case where short_read_dissipation=1, there should be no short reads after the complex read. Is this the reason why the number of operations can be different across different runs at the end?

xwkuang5 avatar Feb 21 '20 15:02 xwkuang5

If the above is true, is there a way to set the random seed in the test driver to make sure that the workload of a particular benchmark can be replayed?

xwkuang5 avatar Feb 21 '20 15:02 xwkuang5

Hi @xwkuang5

Sorry for the delay in replying.

I was getting three different final operations counts (2473, 2532, 2584) across 3 different runs. Is this the expected result? I will discuss this with task force when we talk next

I've just ran the cypher implementation a few times with your configuration and can reproduce the issue. Which scale factor are you using to generate the data?

Best,

Jack

jackwaudby avatar Apr 13 '20 01:04 jackwaudby

Hi Jack, thanks for your reply

I believed it's SF1 (or SF3)

xwkuang5 avatar Apr 13 '20 02:04 xwkuang5