Unable to Execute on cloudlab node: r650 or r6525 (with ConnectX-5 network cards)
Issue 1: https://github.com/HaoyuHuang/NovaLSM/blob/8a661197ce5b993f2baeef608f34192d1ef0adf5/novalsm/nova_server_main.cpp#L235 Is it necessary to modify here to * 1024 twice, rather than once?
Issue 2: Can this project run on CloudLab nodes with updated versions of RDMA network cards, such as r650 or r6525 (using ConnectX-5 network cards)? I attempted to run this project on a cluster composed of five nodes, where the commands executed on each of the five nodes were as follows:
node0:
stdbuf --output=0 --error=0 ./nova_server_main_debug --ltc_migration_policy=immediate --enable_range_index=false --num_migration_threads=32 --num_sstable_replicas=1 --level=6 --l0_start_compaction_mb=4096 --subrange_no_flush_num_keys=100 --enable_detailed_db_stats=false --major_compaction_type=sc --major_compaction_max_parallism=32 --major_compaction_max_tables_in_a_set=20 --enable_flush_multiple_memtables=true --recover_dbs=false --num_recovery_threads=32 --sampling_ratio=1 --zipfian_dist_ref_counts=/tmp/zipfian --client_access_pattern=zipfian --memtable_type=static_partition --enable_subrange=true --num_log_replicas=1 --log_record_mode=none --scatter_policy=power_of_two --number_of_ltcs=2 --enable_lookup_index=true --l0_stop_write_mb=10240 --num_memtable_partitions=64 --num_memtables=256 --num_rdma_bg_workers=16 --db_path=/db/nova-db-10000-1024 --num_storage_workers=8 --stoc_files_path=/db/stoc_files --max_stoc_file_size_mb=4 --sstable_size_mb=2 --ltc_num_stocs_scatter_data_blocks=1 --all_servers=node0:10210,node1:10210,node2:10210 --server_id=0 --mem_pool_size_gb=32 --use_fixed_value_size=1024 --ltc_config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config --ltc_num_client_workers=8 --num_rdma_fg_workers=8 --num_compaction_workers=32 --block_cache_mb=0 --row_cache_mb=0 --memtable_size_mb=4 --cc_log_buf_size=1024 --rdma_port=20820 --rdma_max_msg_size=262144 --rdma_max_num_sends=32 --rdma_doorbell_batch_size=8 --enable_rdma=true --enable_load_data=false --use_local_disk=false
node1:
stdbuf --output=0 --error=0 ./nova_server_main_debug --ltc_migration_policy=immediate --enable_range_index=false --num_migration_threads=32 --num_sstable_replicas=1 --level=6 --l0_start_compaction_mb=4096 --subrange_no_flush_num_keys=100 --enable_detailed_db_stats=false --major_compaction_type=sc --major_compaction_max_parallism=32 --major_compaction_max_tables_in_a_set=20 --enable_flush_multiple_memtables=true --recover_dbs=false --num_recovery_threads=32 --sampling_ratio=1 --zipfian_dist_ref_counts=/tmp/zipfian --client_access_pattern=zipfian --memtable_type=static_partition --enable_subrange=true --num_log_replicas=1 --log_record_mode=none --scatter_policy=power_of_two --number_of_ltcs=2 --enable_lookup_index=true --l0_stop_write_mb=10240 --num_memtable_partitions=64 --num_memtables=256 --num_rdma_bg_workers=16 --db_path=/db/nova-db-10000-1024 --num_storage_workers=8 --stoc_files_path=/db/stoc_files --max_stoc_file_size_mb=4 --sstable_size_mb=2 --ltc_num_stocs_scatter_data_blocks=1 --all_servers=node0:10210,node1:10210,node2:10210 --server_id=1 --mem_pool_size_gb=32 --use_fixed_value_size=1024 --ltc_config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config --ltc_num_client_workers=8 --num_rdma_fg_workers=8 --num_compaction_workers=32 --block_cache_mb=0 --row_cache_mb=0 --memtable_size_mb=4 --cc_log_buf_size=1024 --rdma_port=20820 --rdma_max_msg_size=262144 --rdma_max_num_sends=32 --rdma_doorbell_batch_size=8 --enable_rdma=true --enable_load_data=false --use_local_disk=false
node2:
stdbuf --output=0 --error=0 ./nova_server_main_debug --ltc_migration_policy=immediate --enable_range_index=false --num_migration_threads=32 --num_sstable_replicas=1 --level=6 --l0_start_compaction_mb=4096 --subrange_no_flush_num_keys=100 --enable_detailed_db_stats=false --major_compaction_type=sc --major_compaction_max_parallism=32 --major_compaction_max_tables_in_a_set=20 --enable_flush_multiple_memtables=true --recover_dbs=false --num_recovery_threads=32 --sampling_ratio=1 --zipfian_dist_ref_counts=/tmp/zipfian --client_access_pattern=zipfian --memtable_type=static_partition --enable_subrange=true --num_log_replicas=1 --log_record_mode=none --scatter_policy=power_of_two --number_of_ltcs=2 --enable_lookup_index=true --l0_stop_write_mb=10240 --num_memtable_partitions=64 --num_memtables=256 --num_rdma_bg_workers=16 --db_path=/db/nova-db-10000-1024 --num_storage_workers=8 --stoc_files_path=/db/stoc_files --max_stoc_file_size_mb=4 --sstable_size_mb=2 --ltc_num_stocs_scatter_data_blocks=1 --all_servers=node0:10210,node1:10210,node2:10210 --server_id=2 --mem_pool_size_gb=32 --use_fixed_value_size=1024 --ltc_config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config --ltc_num_client_workers=8 --num_rdma_fg_workers=8 --num_compaction_workers=32 --block_cache_mb=0 --row_cache_mb=0 --memtable_size_mb=4 --cc_log_buf_size=1024 --rdma_port=20820 --rdma_max_msg_size=262144 --rdma_max_num_sends=32 --rdma_doorbell_batch_size=8 --enable_rdma=true --enable_load_data=false --use_local_disk=false
node3:
java -cp /tmp/YCSB-Nova/jdbc/conf:/tmp/YCSB-Nova/jdbc/target/jdbc-binding-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jta_1.1_spec/1.1.1/geronimo-jta_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/users/ruixuan/.m2/repository/net/sourceforge/serp/serp/1.13.1/serp-1.13.1.jar:/tmp/YCSB-Nova/core/target/core-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-jdbc/2.1.1/openjpa-jdbc-2.1.1.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jms_1.1_spec/1.1.1/geronimo-jms_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-kernel/2.1.1/openjpa-kernel-2.1.1.jar:/users/ruixuan/.m2/repository/net/spy/spymemcached/2.11.4/spymemcached-2.11.4.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/users/ruixuan/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/users/ruixuan/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-lib/2.1.1/openjpa-lib-2.1.1.jar:/users/ruixuan/.m2/repository/commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar:/users/ruixuan/.m2/repository/mysql/mysql-connector-java/5.1.44/mysql-connector-java-5.1.44.jar:/users/ruixuan/.m2/repository/com/google/guava/guava/21.0/guava-21.0.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.NovaDBClient -P /users/ruixuan/NovaLSM/workloads/workloadw -P /users/ruixuan/NovaLSM/workloads/db.properties -s -threads 16 -p nova_servers=node0:10210,node1:10210 -p debug=false -p partition=range -p stringkey=false -p insertorder=ordered -p recordcount=10000 -p maxexecutiontime=1200 -p requestdistribution=zipfian -p valuesize=1024 -p config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config -p operationcount=0 -p cardinality=10 -p zipfianconstant=0.99 -p offset=0
node4:
java -cp /tmp/YCSB-Nova/jdbc/conf:/tmp/YCSB-Nova/jdbc/target/jdbc-binding-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jta_1.1_spec/1.1.1/geronimo-jta_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/users/ruixuan/.m2/repository/net/sourceforge/serp/serp/1.13.1/serp-1.13.1.jar:/tmp/YCSB-Nova/core/target/core-0.13.0-SNAPSHOT.jar:/users/ruixuan/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-jdbc/2.1.1/openjpa-jdbc-2.1.1.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/users/ruixuan/.m2/repository/org/apache/geronimo/specs/geronimo-jms_1.1_spec/1.1.1/geronimo-jms_1.1_spec-1.1.1.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-kernel/2.1.1/openjpa-kernel-2.1.1.jar:/users/ruixuan/.m2/repository/net/spy/spymemcached/2.11.4/spymemcached-2.11.4.jar:/users/ruixuan/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/users/ruixuan/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/users/ruixuan/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/users/ruixuan/.m2/repository/org/apache/openjpa/openjpa-lib/2.1.1/openjpa-lib-2.1.1.jar:/users/ruixuan/.m2/repository/commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar:/users/ruixuan/.m2/repository/mysql/mysql-connector-java/5.1.44/mysql-connector-java-5.1.44.jar:/users/ruixuan/.m2/repository/com/google/guava/guava/21.0/guava-21.0.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.NovaDBClient -P /users/ruixuan/NovaLSM/workloads/workloadw -P /users/ruixuan/NovaLSM/workloads/db.properties -s -threads 16 -p nova_servers=node0:10210,node1:10210 -p debug=false -p partition=range -p stringkey=false -p insertorder=ordered -p recordcount=10000 -p maxexecutiontime=1200 -p requestdistribution=zipfian -p valuesize=1024 -p config_path=/users/ruixuan/NovaLSM/config/nova-tutorial-config -p operationcount=0 -p cardinality=10 -p zipfianconstant=0.99 -p offset=0
The objective is to use node0 and node1 as ltc, node2 as stoc, and node3 and node4 as clients. Am I correct in my understanding of the commands? Additionally, when I run in the manner described above, node0 encounters a segmentation fault. The error occurred possibly during the execution of the first flush or compaction, after successfully inserting approximately 16348 key-value pairs. The error location is at https://github.com/HaoyuHuang/NovaLSM/blob/8a661197ce5b993f2baeef608f34192d1ef0adf5/rdma/nova_rdma_rc_broker.cpp#L252 where the value of wcs_[i].status is IBV_WC_RETRY_EXC_ERR. Does this imply that RDMA is not properly configured?
Thank you for your time and assistance.