proxychains-ng icon indicating copy to clipboard operation
proxychains-ng copied to clipboard

work in MongoDB 3.0, but MongoDB 3.2 and 3.4 got connection timed out

Open ma6174 opened this issue 7 years ago • 31 comments

socks5 proxy server running at 127.0.0.1:2345

proxychains config:

random_chain
chain_len = 1
remote_dns_subnet 224
tcp_read_time_out 3000
tcp_connect_time_out 1500
[ProxyList]
socks5 127.0.0.1 2345

When deploy MongoDB Replset

1. start a mongod version 3.0 and listen at port 7017

$ mkdir data1
$ proxychains4 /path/to/mongod_3.0 --dbpath ./data1/ --port 7017 --replSet test

2. start another mongod version 3.0 and listen at port 7018

$ mkdir data2
$ proxychains4 /path/to/mongod_3.0 --dbpath ./data2/ --port 7017 --replSet test

3. connect to mongod on port 7017 and do replset initiate

$ /path/to/mongo_3.0 --port 7017
MongoDB shell version: 3.0.5
connecting to: 127.0.0.1:7017/test
> rs.initiate({_id:"test", members:[{_id:0, host: "127.0.0.1:7017"}]})
{ "ok" : 1 }
test:SECONDARY>
test:PRIMARY> rs.add("127.0.0.1:7018")
{ "ok" : 1 }
test:PRIMARY> rs.status()
{
	"set" : "test",
	"date" : ISODate("2017-04-24T06:47:11.922Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "127.0.0.1:7017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 52,
			"optime" : Timestamp(1493016421, 1),
			"optimeDate" : ISODate("2017-04-24T06:47:01Z"),
			"electionTime" : Timestamp(1493016404, 2),
			"electionDate" : ISODate("2017-04-24T06:46:44Z"),
			"configVersion" : 2,
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "127.0.0.1:7018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 10,
			"optime" : Timestamp(1493016421, 1),
			"optimeDate" : ISODate("2017-04-24T06:47:01Z"),
			"lastHeartbeat" : ISODate("2017-04-24T06:47:11.719Z"),
			"lastHeartbeatRecv" : ISODate("2017-04-24T06:47:11.731Z"),
			"pingMs" : 0,
			"configVersion" : 2
		}
	],
	"ok" : 1
}
test:PRIMARY>

the replset initiate success, one PRIMARY and one SECONDARY.

but when we switch the MongoDB binary to 3.2 or 3.4, rs.add("127.0.0.1:7018") failed :

> rs.initiate({_id:"test", members:[{_id:0, host: "127.0.0.1:7017"}]})
{ "ok" : 1 }
test:SECONDARY>
test:PRIMARY> rs.add("127.0.0.1:7018")
{
	"ok" : 0,
	"errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 127.0.0.1:7017; the following nodes did not respond affirmatively: 127.0.0.1:7018 failed with Couldn't get a connection within the time limit",
	"code" : 74,
	"codeName" : "NodeNotFound"
}
test:PRIMARY>

MongoDB log showed connection timed out(both in linux and osx), proxychains no error

2017-04-24T14:53:13.304+0800 I REPL     [conn1] replSetReconfig admin command received from client
[proxychains] Random chain  ...  127.0.0.1:2345  ...  127.0.0.1:7018  ...  OK
2017-04-24T14:53:13.305+0800 I REPL     [conn1] replSetReconfig config object with 2 members parses ok
2017-04-24T14:53:13.306+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 127.0.0.1:7018
[proxychains] Random chain  ...  127.0.0.1:2345  ...  127.0.0.1:7018  ...  OK
2017-04-24T14:53:23.310+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 127.0.0.1:7018; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-04-24T14:53:23.310+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 127.0.0.1:7017; the following nodes did not respond affirmatively: 127.0.0.1:7018 failed with Couldn't get a connection within the time limit
2017-04-24T14:53:23.310+0800 I COMMAND  [conn1] command admin.$cmd appName: "MongoDB Shell" command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "127.0.0.1:7017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "127.0.0.1:7018" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('58fda0d040cafa4d11a4960d') } } } numYields:0 reslen:336 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_command 10006ms
2017-04-24T14:53:33.308+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 127.0.0.1:7018 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:127.0.0.1:7018 db:admin cmd:{ isMaster: 1 }

ma6174 avatar Apr 24 '17 07:04 ma6174

i'm not familiar with mongodb, but it's a good start that you know which versions work and which don't. to find the exact cause of the issue, you could try to git bisect on the mongodb git repo to try to find the exact commit that broke it. with that knowledge we can then start looking for a potential bug in proxychains or mongodb.

rofl0r avatar Apr 24 '17 19:04 rofl0r

I also tested some other software similar to proxychains, kneesocks(with some fix) works well on linux with MongoDB version 3.2 and 3.4, so there must be some potential bug in proxychains, not in mongodb.

There are 4846 commits between MongoDB 3.0 and 3.2, 12937 file changes, it should be some difficult to find the exact commit that broke it. can you find something different from kneesocks?

ma6174 avatar Apr 25 '17 06:04 ma6174

i don't deny that there may be a bug in proxychains, but your "it works with X so the bug must be in Y" logic is flawed. in any case, it would be very helpful to see which change in mongodb causes the bug (whereever it is) to manifest. ~5000 commits may sound a lot, but with git bisect you should only need at most 12 tries to find the offending commit (i.e. 12 times recompiling mongodb from source).

rofl0r avatar Apr 29 '17 03:04 rofl0r

After some debugging, I found after this commit https://github.com/mongodb/mongo/commit/b256b88c78c28e37b033fcb0134e75649fbe3d5c proxychains don't work. It looks proxychains not work well with asio network library?

ma6174 avatar Apr 29 '17 14:04 ma6174

good find! you can probably fix your issue by manually using another network implementation. however it'd be nice to know what the ASIO interface does differently, that causes the bug to surface.

rofl0r avatar Apr 29 '17 20:04 rofl0r

MongoDB removed the thread based network interface from code: https://github.com/mongodb/mongo/commit/a998887902dfd9cb6f125233c86361064f80c57e after switch network interface to ASIO, it cannot be simply revert this two commit because some future changes may not support thread based network interface and compile failed.

ma6174 avatar Apr 30 '17 04:04 ma6174

oh, i thought that since it looks like they have "pluggable" networking implementations (suggested by example by their usage of the term "default implementation") it's possible from the configuration side to select a different impl, without having to change their code.

rofl0r avatar Apr 30 '17 18:04 rofl0r

anyway, maybe it is helpful when you compile proxychains in debug mode (CFLAGS=-DDEBUG ./configure) to see what's going on when the connection times out

rofl0r avatar Apr 30 '17 19:04 rofl0r

Here is the full log of mongod and proxychains both with debug open:

$ ./proxychains4 -f proxychains.conf ./mongod --dbpath data1 --replSet test --port 1111 -vv
[proxychains] config file found: proxychains.conf
[proxychains] preloading ./libproxychains4.so
DEBUG:init_lib_wrapper called from gcc_init
DEBUG:pid[33275]:at_init
DEBUG:pid[33275]:wait_data
DEBUG:[play] socks5 127.0.0.1:2345
[proxychains] DLL init: proxychains-ng 4.12
DEBUG:loaded symbol 'connect' real addr 0x7f42e945f710  wrapped addr 0x7f42ea2afdd9
DEBUG:loaded symbol 'sendto' real addr 0x7f42e945f9f0  wrapped addr 0x7f42ea2b0804
DEBUG:loaded symbol 'gethostbyname' real addr 0x7f42e9197ea0  wrapped addr 0x7f42ea2b0313
DEBUG:loaded symbol 'getaddrinfo' real addr 0x7f42e915abc0  wrapped addr 0x7f42ea2b037f
DEBUG:loaded symbol 'freeaddrinfo' real addr 0x7f42e915b830  wrapped addr 0x7f42ea2b042c
DEBUG:loaded symbol 'gethostbyaddr' real addr 0x7f42e91978e0  wrapped addr 0x7f42ea2b06f0
DEBUG:loaded symbol 'getnameinfo' real addr 0x7f42e919d4c0  wrapped addr 0x7f42ea2ac1c0
DEBUG:loaded symbol 'close' real addr 0x7f42e945f3e0  wrapped addr 0x7f42ea2afcd7
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] MongoDB starting : pid=33275 port=1111 dbpath=data1 64-bit host=cs48
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] db version v3.4.4-12-gc8fdff8
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] git version: c8fdff8e541fd366a6cf72a5bbba916498625a1a
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] modules: none
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] build environment:
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten]     distarch: x86_64
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2017-05-01T15:18:25.792+0800 I CONTROL  [initandlisten] options: { net: { port: 1111 }, replication: { replSet: "test" }, storage: { dbPath: "data1" }, systemLog: { verbosity: 2 } }
2017-05-01T15:18:25.792+0800 D NETWORK  [initandlisten] fd limit hard:65535 soft:65535 max conn: 52428
DEBUG:getaddrinfo: 0.0.0.0 1111
DEBUG:pid[33275]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f42ef8c12c0
DEBUG:pid[33275]:proxy_freeaddrinfo
2017-05-01T15:18:25.828+0800 I STORAGE  [initandlisten]
2017-05-01T15:18:25.828+0800 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2017-05-01T15:18:25.828+0800 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2017-05-01T15:18:25.829+0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=63897M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-05-01T15:18:25.848+0800 D COMMAND  [WTJournalFlusher] BackgroundJob starting: WTJournalFlusher
2017-05-01T15:18:25.848+0800 D STORAGE  [WTJournalFlusher] starting WTJournalFlusher thread
2017-05-01T15:18:25.850+0800 D STORAGE  [initandlisten] WiredTigerKVEngine::createRecordStore uri: table:_mdb_catalog config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:25.853+0800 D STORAGE  [initandlisten] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:_mdb_catalog ok range 1 -> 1 current: 1
2017-05-01T15:18:25.858+0800 D STORAGE  [initandlisten] flushing directory data1
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten]
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten]
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten]
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten] ** WARNING: You are running on a NUMA machine.
2017-05-01T15:18:25.858+0800 I CONTROL  [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten] **              numactl --interleave=all mongod [other options]
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten]
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten]
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-05-01T15:18:25.859+0800 I CONTROL  [initandlisten]
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] enter repairDatabases (to check pdfile version #)
2017-05-01T15:18:25.859+0800 D -        [initandlisten] reloading view catalog for database local
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] Checking node for SERVER-23299 eligibility
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] Didn't find local.startup_log
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] done repairDatabases
2017-05-01T15:18:25.859+0800 D QUERY    [initandlisten] Running query: query: {} sort: {} projection: {}
2017-05-01T15:18:25.859+0800 D QUERY    [initandlisten] Collection admin.system.roles does not exist. Using EOF plan: query: {} sort: {} projection: {}
2017-05-01T15:18:25.859+0800 I COMMAND  [initandlisten] query admin.system.roles planSummary: EOF ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 6, w: 1, W: 1 } }, Database: { acquireCount: { r: 2, W: 1 } }, Collection: { acquireCount: { r: 2 } } } 0ms
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] create collection local.startup_log { capped: true, size: 10485760 }
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] stored meta data for local.startup_log @ RecordId(1)
2017-05-01T15:18:25.859+0800 D STORAGE  [initandlisten] WiredTigerKVEngine::createRecordStore uri: table:collection-0--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:25.862+0800 D STORAGE  [initandlisten] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-0--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:25.862+0800 D STORAGE  [initandlisten] local.startup_log: clearing plan cache - collection info cache reset
2017-05-01T15:18:25.862+0800 D STORAGE  [initandlisten] WiredTigerKVEngine::createSortedDataInterface ident: index-1--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.startup_log" }),
2017-05-01T15:18:25.862+0800 D STORAGE  [initandlisten] create uri: table:index-1--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.startup_log" }),
2017-05-01T15:18:25.865+0800 D STORAGE  [initandlisten] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-1--3505979310332637255 ok range 6 -> 8 current: 6
2017-05-01T15:18:25.865+0800 D STORAGE  [initandlisten] local.startup_log: clearing plan cache - collection info cache reset
2017-05-01T15:18:25.865+0800 D INDEX    [initandlisten] marking index _id_ as ready in snapshot id 5
2017-05-01T15:18:25.866+0800 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'data1/diagnostic.data'
2017-05-01T15:18:25.866+0800 D INDEX    [initandlisten] checking complete
2017-05-01T15:18:25.866+0800 D QUERY    [initandlisten] Collection local.me does not exist. Using EOF plan: query: {} sort: {} projection: {}
2017-05-01T15:18:25.866+0800 D STORAGE  [initandlisten] stored meta data for local.me @ RecordId(3)
2017-05-01T15:18:25.866+0800 D STORAGE  [initandlisten] WiredTigerKVEngine::createRecordStore uri: table:collection-2--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:25.869+0800 D STORAGE  [initandlisten] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-2--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:25.869+0800 D STORAGE  [initandlisten] local.me: clearing plan cache - collection info cache reset
2017-05-01T15:18:25.869+0800 D STORAGE  [initandlisten] WiredTigerKVEngine::createSortedDataInterface ident: index-3--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.me" }),
2017-05-01T15:18:25.869+0800 D STORAGE  [initandlisten] create uri: table:index-3--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.me" }),
2017-05-01T15:18:25.871+0800 D STORAGE  [initandlisten] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-3--3505979310332637255 ok range 6 -> 8 current: 6
2017-05-01T15:18:25.871+0800 D STORAGE  [initandlisten] local.me: clearing plan cache - collection info cache reset
2017-05-01T15:18:25.871+0800 D INDEX    [initandlisten] marking index _id_ as ready in snapshot id 6
2017-05-01T15:18:25.872+0800 D QUERY    [initandlisten] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2017-05-01T15:18:25.872+0800 I REPL     [initandlisten] Did not find local voted for document at startup.
2017-05-01T15:18:25.872+0800 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
2017-05-01T15:18:25.872+0800 D ASIO     [NetworkInterfaceASIO-Replication-0] The NetworkInterfaceASIO worker thread is spinning up
2017-05-01T15:18:25.872+0800 D EXECUTOR [replExecDBWorker-0] starting thread in pool replExecDBWorker-Pool
2017-05-01T15:18:25.872+0800 D EXECUTOR [replExecDBWorker-1] starting thread in pool replExecDBWorker-Pool
2017-05-01T15:18:25.873+0800 D COMMAND  [TTLMonitor] BackgroundJob starting: TTLMonitor
2017-05-01T15:18:25.873+0800 D COMMAND  [ClientCursorMonitor] BackgroundJob starting: ClientCursorMonitor
2017-05-01T15:18:25.873+0800 I NETWORK  [thread1] waiting for connections on port 1111
2017-05-01T15:18:25.873+0800 D COMMAND  [PeriodicTaskRunner] BackgroundJob starting: PeriodicTaskRunner
2017-05-01T15:18:25.873+0800 D EXECUTOR [replExecDBWorker-2] starting thread in pool replExecDBWorker-Pool
DEBUG:pid[33275]:pc_getnameinfo
2017-05-01T15:18:44.023+0800 I NETWORK  [thread1] connection accepted from 127.0.0.1:39155 #1 (1 connection now open)
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:pc_getnameinfo
2017-05-01T15:18:44.023+0800 D COMMAND  [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2017-05-01T15:18:44.023+0800 D COMMAND  [conn1] command: replSetGetStatus
2017-05-01T15:18:44.023+0800 I COMMAND  [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } numYields:0 reslen:186 locks:{} protocol:op_query 0ms
2017-05-01T15:18:44.227+0800 D COMMAND  [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2017-05-01T15:18:44.227+0800 D COMMAND  [conn1] command: replSetGetStatus
2017-05-01T15:18:44.227+0800 I COMMAND  [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } numYields:0 reslen:186 locks:{} protocol:op_query 0ms
2017-05-01T15:18:49.816+0800 D COMMAND  [conn1] run command admin.$cmd { replSetInitiate: { _id: "test", members: [ { _id: 0.0, host: "10.200.20.56:1111" } ] } }
2017-05-01T15:18:49.816+0800 D COMMAND  [conn1] command: replSetInitiate
2017-05-01T15:18:49.816+0800 I REPL     [conn1] replSetInitiate admin command received from client
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:pc_getnameinfo
2017-05-01T15:18:49.817+0800 D NETWORK  [conn1] getBoundAddrs(): [ 127.0.0.1] [ 10.200.20.56] [ 172.17.0.1]
DEBUG:getaddrinfo: 10.200.20.56 1111
DEBUG:pid[33275]:proxy_getaddrinfo
DEBUG:pid[33275]:pc_getnameinfo
2017-05-01T15:18:49.817+0800 D NETWORK  [conn1] getAddrsForHost("10.200.20.56:1111"): [ 10.200.20.56]
DEBUG:freeaddrinfo 0x7f42ef606f00
DEBUG:pid[33275]:proxy_freeaddrinfo
2017-05-01T15:18:49.817+0800 I REPL     [conn1] replSetInitiate config object with 1 members parses ok
2017-05-01T15:18:49.817+0800 I REPL     [conn1] ******
2017-05-01T15:18:49.817+0800 I REPL     [conn1] creating replication oplog of size: 21838MB...
2017-05-01T15:18:49.817+0800 D STORAGE  [conn1] stored meta data for local.oplog.rs @ RecordId(4)
2017-05-01T15:18:49.817+0800 D STORAGE  [conn1] WiredTigerKVEngine::createRecordStore uri: table:collection-4--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,type=file,memory_page_max=10m,key_format=q,value_format=u,app_metadata=(formatVersion=1,oplogKeyExtractionVersion=1)
2017-05-01T15:18:49.820+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-4--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:49.820+0800 I STORAGE  [conn1] Starting WiredTigerRecordStoreThread local.oplog.rs
2017-05-01T15:18:49.821+0800 I STORAGE  [conn1] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2017-05-01T15:18:49.821+0800 I STORAGE  [conn1] Scanning the oplog to determine where to place markers for truncation
2017-05-01T15:18:49.821+0800 D COMMAND  [WT RecordStoreThread: local.oplog.rs] BackgroundJob starting: WT RecordStoreThread: local.oplog.rs
2017-05-01T15:18:49.821+0800 D STORAGE  [conn1] local.oplog.rs: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.821+0800 D STORAGE  [conn1] WiredTigerKVEngine::flushAllFiles
2017-05-01T15:18:49.821+0800 D STORAGE  [conn1] WiredTigerSizeStorer::storeInto table:_mdb_catalog -> { numRecords: 4, dataSize: 915 }
2017-05-01T15:18:49.821+0800 D STORAGE  [conn1] WiredTigerSizeStorer::storeInto table:collection-0--3505979310332637255 -> { numRecords: 1, dataSize: 1499 }
2017-05-01T15:18:49.821+0800 D STORAGE  [conn1] WiredTigerSizeStorer::storeInto table:collection-2--3505979310332637255 -> { numRecords: 1, dataSize: 37 }
2017-05-01T15:18:49.821+0800 D STORAGE  [conn1] WiredTigerSizeStorer::storeInto table:collection-4--3505979310332637255 -> { numRecords: 0, dataSize: 0 }
2017-05-01T15:18:49.830+0800 I REPL     [conn1] ******
2017-05-01T15:18:49.830+0800 D STORAGE  [conn1] stored meta data for local.system.replset @ RecordId(5)
2017-05-01T15:18:49.830+0800 D STORAGE  [conn1] WiredTigerKVEngine::createRecordStore uri: table:collection-5--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:49.833+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-5--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:49.833+0800 D STORAGE  [conn1] local.system.replset: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.833+0800 D STORAGE  [conn1] WiredTigerKVEngine::createSortedDataInterface ident: index-6--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.system.replset" }),
2017-05-01T15:18:49.833+0800 D STORAGE  [conn1] create uri: table:index-6--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.system.replset" }),
2017-05-01T15:18:49.836+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-6--3505979310332637255 ok range 6 -> 8 current: 6
2017-05-01T15:18:49.836+0800 D STORAGE  [conn1] local.system.replset: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.836+0800 D INDEX    [conn1] marking index _id_ as ready in snapshot id 44
2017-05-01T15:18:49.836+0800 D QUERY    [conn1] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2017-05-01T15:18:49.837+0800 D STORAGE  [conn1] stored meta data for local.replset.minvalid @ RecordId(6)
2017-05-01T15:18:49.837+0800 D STORAGE  [conn1] WiredTigerKVEngine::createRecordStore uri: table:collection-7--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:49.840+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-7--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:49.840+0800 D STORAGE  [conn1] local.replset.minvalid: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.840+0800 D STORAGE  [conn1] WiredTigerKVEngine::createSortedDataInterface ident: index-8--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.replset.minvalid" }),
2017-05-01T15:18:49.840+0800 D STORAGE  [conn1] create uri: table:index-8--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.replset.minvalid" }),
2017-05-01T15:18:49.843+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-8--3505979310332637255 ok range 6 -> 8 current: 6
2017-05-01T15:18:49.843+0800 D STORAGE  [conn1] local.replset.minvalid: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.843+0800 D INDEX    [conn1] marking index _id_ as ready in snapshot id 45
2017-05-01T15:18:49.843+0800 D QUERY    [conn1] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2017-05-01T15:18:49.843+0800 D -        [conn1] reloading view catalog for database admin
2017-05-01T15:18:49.843+0800 D STORAGE  [conn1] stored meta data for admin.system.version @ RecordId(7)
2017-05-01T15:18:49.843+0800 D STORAGE  [conn1] WiredTigerKVEngine::createRecordStore uri: table:collection-9--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:49.846+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-9--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:49.846+0800 D STORAGE  [conn1] admin.system.version: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.847+0800 D STORAGE  [conn1] WiredTigerKVEngine::createSortedDataInterface ident: index-10--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "admin.system.version" }),
2017-05-01T15:18:49.847+0800 D STORAGE  [conn1] create uri: table:index-10--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=6,infoObj={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "admin.system.version" }),
2017-05-01T15:18:49.849+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-10--3505979310332637255 ok range 6 -> 8 current: 6
2017-05-01T15:18:49.849+0800 D STORAGE  [conn1] admin.system.version: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.849+0800 D INDEX    [conn1] marking index _id_ as ready in snapshot id 47
2017-05-01T15:18:49.849+0800 D STORAGE  [conn1] WiredTigerKVEngine::createSortedDataInterface ident: index-11--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "version" : 1 }, "name" : "incompatible_with_version_32", "ns" : "admin.system.version" }),
2017-05-01T15:18:49.849+0800 D STORAGE  [conn1] create uri: table:index-11--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "version" : 1 }, "name" : "incompatible_with_version_32", "ns" : "admin.system.version" }),
2017-05-01T15:18:49.852+0800 D STORAGE  [conn1] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-11--3505979310332637255 ok range 6 -> 8 current: 8
2017-05-01T15:18:49.852+0800 D STORAGE  [conn1] admin.system.version: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.852+0800 I INDEX    [conn1] build index on: admin.system.version properties: { v: 2, key: { version: 1 }, name: "incompatible_with_version_32", ns: "admin.system.version" }
2017-05-01T15:18:49.852+0800 I INDEX    [conn1] 	 building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2017-05-01T15:18:49.852+0800 D INDEX    [conn1] 	 bulk commit starting for index: incompatible_with_version_32
2017-05-01T15:18:49.853+0800 D INDEX    [conn1] 	 done building bottom layer, going to commit
2017-05-01T15:18:49.853+0800 I INDEX    [conn1] build index done.  scanned 0 total records. 0 secs
2017-05-01T15:18:49.853+0800 D INDEX    [conn1] marking index incompatible_with_version_32 as ready in snapshot id 50
2017-05-01T15:18:49.854+0800 I COMMAND  [conn1] setting featureCompatibilityVersion to 3.4
2017-05-01T15:18:49.854+0800 D REPL     [conn1] Updated term in topology coordinator to 0 due to new config
2017-05-01T15:18:49.854+0800 I REPL     [conn1] New replica set config in use: { _id: "test", version: 1, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5906e159c7e0c4c709f69f96') } }
2017-05-01T15:18:49.854+0800 I REPL     [conn1] This node is 10.200.20.56:1111 in the config
2017-05-01T15:18:49.854+0800 I REPL     [conn1] transition to STARTUP2
2017-05-01T15:18:49.854+0800 I REPL     [conn1] Starting replication storage threads
2017-05-01T15:18:49.854+0800 D ASIO     [NetworkInterfaceASIO-RS-0] The NetworkInterfaceASIO worker thread is spinning up
2017-05-01T15:18:49.854+0800 D EXECUTOR [replication-0] starting thread in pool replication
2017-05-01T15:18:49.854+0800 D EXECUTOR [initial sync-0] starting thread in pool initial sync-Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 0] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 1] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 2] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 3] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 4] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 5] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 11] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 7] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 12] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 I REPL     [conn1] Starting replication fetcher thread
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 14] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 6] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 8] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 13] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.854+0800 D EXECUTOR [repl writer worker 9] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 10] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 15] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 I REPL     [conn1] Starting replication applier thread
2017-05-01T15:18:49.855+0800 I REPL     [conn1] Starting replication reporter thread
2017-05-01T15:18:49.855+0800 I REPL     [rsSync] transition to RECOVERING
2017-05-01T15:18:49.855+0800 I COMMAND  [conn1] command local.replset.minvalid command: replSetInitiate { v: 2, key: { version: 1 }, ns: "admin.system.version", name: "incompatible_with_version_32" } numYields:0 reslen:37 locks:{ Global: { acquireCount: { r: 14, w: 8, W: 2 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 56 } }, Database: { acquireCount: { r: 2, w: 2, W: 6 } }, Collection: { acquireCount: { r: 2, w: 2 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 2 } } } protocol:op_query 39ms
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 0] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D REPL     [rsBackgroundSync] bgsync fetch queue set to: { ts: Timestamp 1493623129000|1, t: -1 } 8120695218824595400
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 1] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 2] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 3] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 5] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 6] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 4] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 7] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 8] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 15] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 10] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 11] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 12] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 13] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 14] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.855+0800 D EXECUTOR [repl writer worker 9] starting thread in pool repl writer worker Pool
2017-05-01T15:18:49.856+0800 I REPL     [rsSync] transition to SECONDARY
2017-05-01T15:18:49.856+0800 I REPL     [rsSync] conducting a dry run election to see if we could be elected
2017-05-01T15:18:49.856+0800 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2017-05-01T15:18:49.856+0800 D REPL     [ReplicationExecutor] Updating term from 0 to 1
2017-05-01T15:18:49.856+0800 D STORAGE  [replExecDBWorker-0] stored meta data for local.replset.election @ RecordId(8)
2017-05-01T15:18:49.856+0800 D STORAGE  [replExecDBWorker-0] WiredTigerKVEngine::createRecordStore uri: table:collection-12--3505979310332637255 config: type=file,memory_page_max=10m,split_pct=90,leaf_value_max=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-05-01T15:18:49.856+0800 D COMMAND  [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2017-05-01T15:18:49.856+0800 D COMMAND  [conn1] command: replSetGetStatus
2017-05-01T15:18:49.856+0800 I COMMAND  [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } numYields:0 reslen:530 locks:{} protocol:op_query 0ms
2017-05-01T15:18:49.859+0800 D STORAGE  [replExecDBWorker-0] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:collection-12--3505979310332637255 ok range 1 -> 1 current: 1
2017-05-01T15:18:49.859+0800 D STORAGE  [replExecDBWorker-0] local.replset.election: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.859+0800 D STORAGE  [replExecDBWorker-0] WiredTigerKVEngine::createSortedDataInterface ident: index-13--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.replset.election" }),
2017-05-01T15:18:49.859+0800 D STORAGE  [replExecDBWorker-0] create uri: table:index-13--3505979310332637255 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "local.replset.election" }),
2017-05-01T15:18:49.862+0800 D STORAGE  [replExecDBWorker-0] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:index-13--3505979310332637255 ok range 6 -> 8 current: 8
2017-05-01T15:18:49.862+0800 D STORAGE  [replExecDBWorker-0] local.replset.election: clearing plan cache - collection info cache reset
2017-05-01T15:18:49.862+0800 D INDEX    [replExecDBWorker-0] marking index _id_ as ready in snapshot id 63
2017-05-01T15:18:49.862+0800 D QUERY    [replExecDBWorker-0] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2017-05-01T15:18:49.862+0800 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 1
2017-05-01T15:18:49.862+0800 I REPL     [ReplicationExecutor] transition to PRIMARY
2017-05-01T15:18:49.862+0800 I REPL     [ReplicationExecutor] Could not access any nodes within timeout when checking for additional ops to apply before finishing transition to primary. Will move forward with becoming primary anyway.
2017-05-01T15:18:51.856+0800 D QUERY    [rsSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2017-05-01T15:18:51.857+0800 D REPL     [rsSync] Ignoring older committed snapshot from before I became primary, optime: { ts: Timestamp 1493623129000|1, t: -1 }, firstOpTimeOfMyTerm: { ts: Timestamp 2147483647000|0, t: 2147483647 }
2017-05-01T15:18:51.857+0800 D REPL     [rsSync] Removing temporary collections from admin
2017-05-01T15:18:51.857+0800 D REPL     [rsSync] Ignoring older committed snapshot from before I became primary, optime: { ts: Timestamp 1493623129000|1, t: -1 }, firstOpTimeOfMyTerm: { ts: Timestamp 1493623131000|1, t: 1 }
2017-05-01T15:18:51.857+0800 I REPL     [rsSync] transition to primary complete; database writes are now permitted
2017-05-01T15:18:51.871+0800 D REPL     [WTJournalFlusher] Updating _lastCommittedOpTime to { ts: Timestamp 1493623131000|1, t: 1 }
2017-05-01T15:18:57.098+0800 D COMMAND  [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2017-05-01T15:18:57.098+0800 D COMMAND  [conn1] command: replSetGetStatus
2017-05-01T15:18:57.098+0800 I COMMAND  [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } numYields:0 reslen:572 locks:{} protocol:op_query 0ms
2017-05-01T15:18:57.328+0800 D COMMAND  [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2017-05-01T15:18:57.328+0800 D COMMAND  [conn1] command: replSetGetStatus
2017-05-01T15:18:57.328+0800 I COMMAND  [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } numYields:0 reslen:572 locks:{} protocol:op_query 0ms
2017-05-01T15:19:00.873+0800 D COMMAND  [conn1] run command local.$cmd { count: "system.replset", query: {}, fields: {} }
2017-05-01T15:19:00.873+0800 I COMMAND  [conn1] command local.system.replset command: count { count: "system.replset", query: {}, fields: {} } planSummary: COUNT keysExamined:0 docsExamined:0 numYields:0 reslen:44 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 0ms
2017-05-01T15:19:00.873+0800 D QUERY    [conn1] Running query: query: {} sort: {} projection: {} ntoreturn=1
2017-05-01T15:19:00.873+0800 D QUERY    [conn1] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {} ntoreturn=1, planSummary: COLLSCAN
2017-05-01T15:19:00.873+0800 I COMMAND  [conn1] query local.system.replset planSummary: COLLSCAN ntoskip:0 keysExamined:0 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:473 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 0ms
2017-05-01T15:19:00.874+0800 D COMMAND  [conn1] run command admin.$cmd { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5906e159c7e0c4c709f69f96') } } }
2017-05-01T15:19:00.874+0800 D COMMAND  [conn1] command: replSetReconfig
2017-05-01T15:19:00.874+0800 I REPL     [conn1] replSetReconfig admin command received from client
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:pc_getnameinfo
2017-05-01T15:19:00.875+0800 D NETWORK  [conn1] getBoundAddrs(): [ 127.0.0.1] [ 10.200.20.56] [ 172.17.0.1]
DEBUG:getaddrinfo: 10.200.20.56 1111
DEBUG:pid[33275]:proxy_getaddrinfo
DEBUG:pid[33275]:pc_getnameinfo
2017-05-01T15:19:00.875+0800 D NETWORK  [conn1] getAddrsForHost("10.200.20.56:1111"): [ 10.200.20.56]
DEBUG:freeaddrinfo 0x7f42f2d86900
DEBUG:pid[33275]:proxy_freeaddrinfo
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[33275]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f42f2d86900
DEBUG:pid[33275]:proxy_freeaddrinfo
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:pc_getnameinfo
DEBUG:pid[33275]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[33275]:connect_proxy_chain
DEBUG:[play] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[33275]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[33275]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[33275]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-01T15:19:00.876+0800 D NETWORK  [conn1] connected to server 10.200.20.56:1112 (10.200.20.56)
2017-05-01T15:19:00.877+0800 I REPL     [conn1] replSetReconfig config object with 2 members parses ok
2017-05-01T15:19:00.877+0800 D ASIO     [conn1] startCommand: RemoteCommand 1 -- target:10.200.20.56:1112 db:admin expDate:2017-05-01T15:19:10.877+0800 cmd:{ replSetHeartbeat: "test", pv: 1, v: 2, from: "10.200.20.56:1111", fromId: 0, checkEmpty: false }
2017-05-01T15:19:00.877+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[33275]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f42f2e5a140
DEBUG:pid[33275]:proxy_freeaddrinfo
DEBUG:pid[33275]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[33275]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[33275]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[33275]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[33275]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-01T15:19:00.878+0800 D ASIO     [NetworkInterfaceASIO-Replication-0] Starting asynchronous command 2 on host 10.200.20.56:1112
2017-05-01T15:19:01.857+0800 D REPL     [NoopWriter] Writing noop to oplog as there has been no writes to this replica set in over 10s
2017-05-01T15:19:01.857+0800 D REPL     [NoopWriter] Set last known op time to { ts: Timestamp 1493623141000|1, t: 1 }
2017-05-01T15:19:01.880+0800 D REPL     [WTJournalFlusher] Updating _lastCommittedOpTime to { ts: Timestamp 1493623141000|1, t: 1 }
2017-05-01T15:19:10.877+0800 D ASIO     [NetworkInterfaceASIO-Replication-0] Failed to get connection from pool for request 1: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-01T15:19:10.877+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-01T15:19:10.877+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit
2017-05-01T15:19:10.877+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5906e159c7e0c4c709f69f96') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms
2017-05-01T15:19:10.879+0800 D COMMAND  [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2017-05-01T15:19:10.879+0800 D COMMAND  [conn1] command: replSetGetStatus
2017-05-01T15:19:10.879+0800 I COMMAND  [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } numYields:0 reslen:572 locks:{} protocol:op_query 0ms
2017-05-01T15:19:11.857+0800 D REPL     [NoopWriter] Writing noop to oplog as there has been no writes to this replica set in over 10s
2017-05-01T15:19:11.857+0800 D REPL     [NoopWriter] Set last known op time to { ts: Timestamp 1493623151000|1, t: 1 }
2017-05-01T15:19:11.889+0800 D REPL     [WTJournalFlusher] Updating _lastCommittedOpTime to { ts: Timestamp 1493623151000|1, t: 1 }
2017-05-01T15:19:20.877+0800 D ASIO     [NetworkInterfaceASIO-Replication-0] Operation timing out; original request was: RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-01T15:19:20.877+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.200.20.56:1112 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-01T15:19:20.877+0800 D ASIO     [NetworkInterfaceASIO-Replication-0] Request 2 finished with response: NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-01T15:19:20.877+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.200.20.56:1112 due to failed operation on a connection
2017-05-01T15:19:21.858+0800 D REPL     [NoopWriter] Writing noop to oplog as there has been no writes to this replica set in over 10s
2017-05-01T15:19:21.858+0800 D REPL     [NoopWriter] Set last known op time to { ts: Timestamp 1493623161000|1, t: 1 }
2017-05-01T15:19:21.898+0800 D REPL     [WTJournalFlusher] Updating _lastCommittedOpTime to { ts: Timestamp 1493623161000|1, t: 1 }
2017-05-01T15:19:25.873+0800 D -        [PeriodicTaskRunner] cleaning up unused lock buckets of the global lock manager
2017-05-01T15:19:25.902+0800 D STORAGE  [WTJournalFlusher] WiredTigerSizeStorer::storeInto table:_mdb_catalog -> { numRecords: 8, dataSize: 2587 }
2017-05-01T15:19:25.902+0800 D STORAGE  [WTJournalFlusher] WiredTigerSizeStorer::storeInto table:collection-12--3505979310332637255 -> { numRecords: 1, dataSize: 60 }
2017-05-01T15:19:25.902+0800 D STORAGE  [WTJournalFlusher] WiredTigerSizeStorer::storeInto table:collection-4--3505979310332637255 -> { numRecords: 5, dataSize: 468 }
2017-05-01T15:19:25.902+0800 D STORAGE  [WTJournalFlusher] WiredTigerSizeStorer::storeInto table:collection-5--3505979310332637255 -> { numRecords: 1, dataSize: 453 }
2017-05-01T15:19:25.902+0800 D STORAGE  [WTJournalFlusher] WiredTigerSizeStorer::storeInto table:collection-7--3505979310332637255 -> { numRecords: 1, dataSize: 45 }
2017-05-01T15:19:25.902+0800 D STORAGE  [WTJournalFlusher] WiredTigerSizeStorer::storeInto table:collection-9--3505979310332637255 -> { numRecords: 1, dataSize: 59 }
2017-05-01T15:19:31.858+0800 D REPL     [NoopWriter] Writing noop to oplog as there has been no writes to this replica set in over 10s
2017-05-01T15:19:31.858+0800 D REPL     [NoopWriter] Set last known op time to { ts: Timestamp 1493623171000|1, t: 1 }
2017-05-01T15:19:31.908+0800 D REPL     [WTJournalFlusher] Updating _lastCommittedOpTime to { ts: Timestamp 1493623171000|1, t: 1 }

ma6174 avatar May 01 '17 07:05 ma6174

looks like the problem is because that ASIO network impl uses sockets in non-blocking mode. the relevant code is in timed_connect() in src/core.c. i can't tell from the top of my head why it doesn't work as it should. do you feel like you could debug it ? you could run the entire command chain in gdb and (gdb --args proxychains4 mongo --foo) and set a breakpoint on timed_connect.

rofl0r avatar May 01 '17 17:05 rofl0r

I use gdb debugged at timed_connect() in proxychains and connect_proxy in kneesocks, for proxtchains, and change the type of addr from struct sockaddr * to struct sockaddr_in *.

addr in proxychains:

*addr = {sin_family = 2, sin_port = 10505, sin_addr = {s_addr = 16777343}, sin_zero = "\000\000\000\000\000\000\000"}

addr in kneesocks

*addr = {sin_family = 2, sin_port = 22276, sin_addr = {s_addr = 940886026}, sin_zero = "\000\000\000\000\000\000\000"}

It seems there is no problem in addr.

there is a little different in sockfd flag :

in proxychains, before connect, fd_flags=2, and after connect, fd_flags=2050.

fcntl(sock, F_SETFL, O_NONBLOCK);

but in sockfd, before connect, fd_flags=2050, after connect, fd_flags=2050

fcntl(sockfd, F_SETFL, fd_flags & ~O_NONBLOCK);

I'm not familiar with c network programing, not sure whether or not the fd_flags will affect non-blocking socket?

ma6174 avatar May 03 '17 15:05 ma6174

hmm. can you try if commenting out line 184 in core.c makes it work ?

- fcntl(sock, F_SETFL, !O_NONBLOCK);
+ //fcntl(sock, F_SETFL, !O_NONBLOCK);

rofl0r avatar May 03 '17 21:05 rofl0r

if I commenting out line 184 in core.c, still connect failed, connect ret=-1:

2017-05-04T06:21:06.701+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[36566]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f724a028140
DEBUG:pid[36566]:proxy_freeaddrinfo
DEBUG:pid[36566]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[36566]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[36566]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[36566]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[36566]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T06:21:16.701+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T06:21:16.701+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit
2017-05-04T06:21:16.701+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590a54e6187854f7222cc683') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms

if I commenting out line 157 in core.c, connect ret=0, connect success, but still got Operation timed out

-fcntl(sock, F_SETFL, O_NONBLOCK);
+//fcntl(sock, F_SETFL, O_NONBLOCK);
2017-05-04T06:22:40.483+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[37317]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f84cf251000
DEBUG:pid[37317]:proxy_freeaddrinfo
DEBUG:pid[37317]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[37317]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[37317]:timed_connect
DEBUG:
connect ret=0
DEBUG:pid[37317]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[37317]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T06:22:50.483+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T06:22:50.483+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit
2017-05-04T06:22:50.483+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590a54e6187854f7222cc683') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms
2017-05-04T06:23:00.483+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.200.20.56:1112 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-04T06:23:00.483+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.200.20.56:1112 due to failed operation on a connection

ma6174 avatar May 03 '17 22:05 ma6174

please try the patch i pushed to branch nonblock

rofl0r avatar May 04 '17 00:05 rofl0r

with the patch, connect ret=0, connect success, but still got Operation timed out

2017-05-04T09:27:58.321+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[15371]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f85f0141000
DEBUG:pid[15371]:proxy_freeaddrinfo
DEBUG:pid[15371]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[15371]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[15371]:timed_connect
DEBUG:
connect ret=0
DEBUG:pid[15371]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[15371]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T09:28:08.321+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T09:28:08.321+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit
2017-05-04T09:28:08.321+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590a54e6187854f7222cc683') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms
2017-05-04T09:28:18.321+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.200.20.56:1112 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-04T09:28:18.321+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.200.20.56:1112 due to failed operation on a connection

ma6174 avatar May 04 '17 01:05 ma6174

oh, i made a mistake. this here fcntl(sock, F_SETFL, flags & (~O_NONBLOCK)); should actually be fcntl(sock, F_SETFL, flags | O_NONBLOCK);

rofl0r avatar May 04 '17 01:05 rofl0r

in case that doesn't work either, i force-pushed a new commit to nonblock branch, hopefully that fixes it!

rofl0r avatar May 04 '17 02:05 rofl0r

now connect ret=-1, connect failed

2017-05-04T10:30:45.192+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[36723]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f570429e140
DEBUG:pid[36723]:proxy_freeaddrinfo
DEBUG:pid[36723]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[36723]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[36723]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[36723]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[36723]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T10:30:55.192+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T10:30:55.192+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit2017-05-04T10:30:55.192+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590a92512faabb955d2acc34') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms
2017-05-04T10:31:05.192+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.200.20.56:1112 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-04T10:31:05.192+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.200.20.56:1112 due to failed operation on a connection

ma6174 avatar May 04 '17 02:05 ma6174

did you test the new patch, or the one-line change from my previous comment ?

rofl0r avatar May 04 '17 02:05 rofl0r

yes, both the one-line change and the new patch don't work, connect failed

diff --git a/src/core.c b/src/core.c
index 19cd3da..510400d 100644
--- a/src/core.c
+++ b/src/core.c
@@ -155,7 +155,8 @@ static int timed_connect(int sock, const struct sockaddr *addr, socklen_t len) {
        pfd[0].fd = sock;
        pfd[0].events = POLLOUT;
        int flags = fcntl(sock, F_GETFL, 0);
-       fcntl(sock, F_SETFL, flags & (~O_NONBLOCK));
+       //fcntl(sock, F_SETFL, flags & (~O_NONBLOCK));
+       fcntl(sock, F_SETFL, flags | O_NONBLOCK);
        ret = true_connect(sock, addr, len);
        PDEBUG("\nconnect ret=%d\n", ret);

ma6174 avatar May 04 '17 03:05 ma6174

it's weird that your previous log has this:

DEBUG:
poll ret=1

since the new commit doesn't actually call poll anymore with a non-blocking socket. are you sure you fetched my latest change from nonblock branch ? maybe there was a problem because i overwrote the earlier commit by force-pushing a new one ?

rofl0r avatar May 04 '17 03:05 rofl0r

I'm sure it's in latest nonblock branch:

here is the git info:

$ git branch
  master
* nonblock
$ git log | head
commit cbddf15b19191257de302687df558adfae10aeb5
Author: rofl0r <[email protected]>
Date:   Thu May 4 01:31:56 2017 +0100

    properly handle non-blocking mode of socket

    addressing #171

commit bb3df1e4409190da2627d8a3b5bdd5e02d246f28
Author: rofl0r <[email protected]>

here is some gdb debug info:

Breakpoint 1, timed_connect (sock=31, addr=0x7ffff7fdb3e0, len=16) at src/core.c:153
153             PFUNC();
(gdb) display *addr
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) display flags
2: flags = 0
(gdb) display ret
3: ret = 2
(gdb) l
148
149     static int timed_connect(int sock, const struct sockaddr *addr, socklen_t len) {
150             int ret, value;
151             socklen_t value_len;
152             struct pollfd pfd[1];
153             PFUNC();
154
155             pfd[0].fd = sock;
156             pfd[0].events = POLLOUT;
157             int flags = fcntl(sock, F_GETFL, 0);
(gdb) n
2017-05-04T11:28:53.492+0800 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, af
ter extra_info: 62006, after globalLock: 62006, after locks: 62006, after network: 62006, after opLatencies: 62006, after opcounters: 62006, after opcountersRepl: 62006, after re
pl: 62006, after storageEngine: 62006, after tcmalloc: 62006, after wiredTiger: 62006, at end: 62006 }
DEBUG:pid[14538]:timed_connect
155             pfd[0].fd = sock;
3: ret = 2
2: flags = 0
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
156             pfd[0].events = POLLOUT;
3: ret = 2
2: flags = 0
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
157             int flags = fcntl(sock, F_GETFL, 0);
3: ret = 2
2: flags = 0
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
160             if(!(flags & O_NONBLOCK))
3: ret = 2
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
161                     fcntl(sock, F_SETFL, flags | O_NONBLOCK);
3: ret = 2
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
162             ret = true_connect(sock, addr, len);
3: ret = 2
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
163             PDEBUG("\nconnect ret=%d\n", ret);
3: ret = -1
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
DEBUG:
connect ret=-1
167             if(flags & O_NONBLOCK)
3: ret = -1
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
2017-05-04T11:29:20.462+0800 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, af
ter extra_info: 2018, after globalLock: 2018, after locks: 2018, after network: 2018, after opLatencies: 2018, after opcounters: 2018, after opcountersRepl: 2018, after repl: 201
8, after storageEngine: 2018, after tcmalloc: 2018, after wiredTiger: 2018, at end: 2018 }
170             if(ret == -1 && errno == EINPROGRESS) {
3: ret = -1
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
171                     ret = poll_retry(pfd, 1, tcp_connect_time_out);
3: ret = -1
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
172                     PDEBUG("\npoll ret=%d\n", ret);
3: ret = 1
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) n
2017-05-04T11:29:31.813+0800 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, af
ter extra_info: 4562, after globalLock: 4562, after locks: 4562, after network: 4562, after opLatencies: 4562, after opcounters: 4562, after opcountersRepl: 4562, after repl: 456
2, after storageEngine: 4562, after tcmalloc: 4562, after wiredTiger: 4562, at end: 4562 }
DEBUG:
poll ret=1
173                     if(ret == 1) {
3: ret = 1
2: flags = 2
1: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb)

ma6174 avatar May 04 '17 03:05 ma6174

thanks! that means that the socket at this point was blocking, so i looked why and found that we actually create a new socket for the connection and then dup() it over the original one. i pushed a patch to branch nonblock2 (note the 2).

rofl0r avatar May 04 '17 03:05 rofl0r

I tested branch nonblock2 and merged nonblock and nonblock2, both connect failed

2017-05-04T12:02:05.508+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[26222]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7f3eb0c5a000
DEBUG:pid[26222]:proxy_freeaddrinfo
DEBUG:pid[26222]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[26222]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[26222]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[26222]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[26222]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T12:02:15.508+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T12:02:15.508+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit
2017-05-04T12:02:15.509+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590a92512faabb955d2acc34') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms
2017-05-04T12:02:25.508+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.200.20.56:1112 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-04T12:02:25.508+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.200.20.56:1112 due to failed operation on a connection

ma6174 avatar May 04 '17 04:05 ma6174

a curious issue... can you merge nonblock2 and my first patch from nonblock? that one seemed to work for the connect step. https://github.com/rofl0r/proxychains-ng/commit/6d90d82a7b0998db5d29bf563ca6cefc72080dbb (you can still get to the commit via its sha1 hash, even though i overwrote it by force-pushing)

rofl0r avatar May 04 '17 04:05 rofl0r

connect success, but Operation timed out

$ git log | head -24
commit 6f093fc039e2bd8adbd4310cb0dd4af47b7af494
Merge: 6d90d82 ea8f4ff
Author: test <[email protected]>
Date:   Thu May 4 12:13:36 2017 +0800

    Merge branch 'nonblock2' into merged2

commit ea8f4ff5dc693fd7f824796a8ac4816b0194178b
Author: rofl0r <[email protected]>
Date:   Thu May 4 04:53:35 2017 +0100

    restore original socket flags after connecting

    the socket may have been in non-blocking mode originally.
    addressing #171

commit 6d90d82a7b0998db5d29bf563ca6cefc72080dbb
Author: rofl0r <[email protected]>
Date:   Thu May 4 01:31:56 2017 +0100

    properly restore non-blocking mode of socket

    addressing #171

2017-05-04T12:14:20.152+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.200.20.56:1112
DEBUG:getaddrinfo: 10.200.20.56 1112
DEBUG:pid[31589]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x7fd9d01d9000
DEBUG:pid[31589]:proxy_freeaddrinfo
DEBUG:pid[31589]:connect
DEBUG:target: 10.200.20.56
DEBUG:port: 1112
DEBUG:pid[31589]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[31589]:timed_connect
DEBUG:
connect ret=0
DEBUG:pid[31589]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[31589]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T12:14:30.152+0800 W REPL     [ReplicationExecutor] Failed to complete heartbeat request to 10.200.20.56:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T12:14:30.152+0800 E REPL     [conn1] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.200.20.56:1111; the following nodes did not respond affirmatively: 10.200.20.56:1112 failed with Couldn't get a connection within the time limit
2017-05-04T12:14:30.152+0800 I COMMAND  [conn1] command admin.$cmd command: replSetReconfig { replSetReconfig: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "10.200.20.56:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "10.200.20.56:1112" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590a92512faabb955d2acc34') } } } numYields:0 reslen:357 locks:{ Global: { acquireCount: { r: 1, W: 1 } } } protocol:op_query 10003ms
2017-05-04T12:14:40.152+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.200.20.56:1112 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 2 -- target:10.200.20.56:1112 db:admin cmd:{ isMaster: 1 }
2017-05-04T12:14:40.152+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.200.20.56:1112 due to failed operation on a connection
Breakpoint 2, connect_proxy_chain (sock=30, target_ip=..., target_port=22532, pd=0x7ffff7d955a0 <proxychains_pd>, proxy_count=1, ct=RANDOM_TYPE, max_chain=1) at src/core.c:560
560                             unsigned int proxy_count, chain_type ct, unsigned int max_chain) {
(gdb) watch flags
Hardware watchpoint 3: flags
(gdb) display flags
1: flags = 32767
(gdb) c
Continuing.
DEBUG:pid[33382]:connect_proxy_chain
DEBUG:[play] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345
Breakpoint 1, timed_connect (sock=31, addr=0x7ffff7fdb3e0, len=16) at src/core.c:153
153             PFUNC();
(gdb) display *addr
2: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) watch flags
Hardware watchpoint 4: flags
(gdb) c
Continuing.
DEBUG:pid[33382]:timed_connect
Hardware watchpoint 4: flags

Old value = 0
New value = 2
timed_connect (sock=31, addr=0x7ffff7fdb3e0, len=16) at src/core.c:158
158             fcntl(sock, F_SETFL, flags & (~O_NONBLOCK));
2: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) display ret
3: ret = 2
(gdb) watch ret
Hardware watchpoint 5: ret
(gdb) c
Continuing.
Hardware watchpoint 5: ret

Old value = 2
New value = 0
timed_connect (sock=31, addr=0x7ffff7fdb3e0, len=16) at src/core.c:160
160             PDEBUG("\nconnect ret=%d\n", ret);
3: ret = 0
2: *addr = {sa_family = 2, sa_data = "\t)\177\000\000\001\000\000\000\000\000\000\000"}
(gdb) c
Continuing.
DEBUG:
connect ret=0

Watchpoint 4 deleted because the program has left the block in
which its expression is valid.

Watchpoint 5 deleted because the program has left the block in
which its expression is valid.
0x00007ffff7b8c5fb in start_chain (fd=0x7ffff7fdb488, pd=0x7ffff7d955a0 <proxychains_pd>, begin_mark=0x7ffff7b91098 "Random chain") at src/core.c:448
448             if(timed_connect(*fd, (struct sockaddr *) (v6?(void*)&addr6:(void*)&addr), v6?sizeof(addr6):sizeof(addr))) {
(gdb) c
Continuing.
DEBUG:pid[33382]:chain_step
 ...  10.200.20.56:1112 DEBUG:pid[33382]:tunnel_to
DEBUG:host dns <NULL>
2017-05-04T12:22:35.553+0800 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, af
ter extra_info: 11405, after globalLock: 11405, after locks: 11405, after network: 11405, after opLatencies: 11405, after opcounters: 11405, after opcountersRepl: 11405, after re
pl: 11405, after storageEngine: 11405, after tcmalloc: 11405, after wiredTiger: 11405, at end: 30173 }
 ...  OK
Hardware watchpoint 3: flags

Old value = 32767
New value = 2
connect_proxy_chain (sock=30, target_ip=..., target_port=22532, pd=0x7ffff7d955a0 <proxychains_pd>, proxy_count=1, ct=RANDOM_TYPE, max_chain=1) at src/core.c:705
705             dup2(ns, sock);
1: flags = 2
(gdb) c
Continuing.

Watchpoint 3 deleted because the program has left the block in
which its expression is valid.
0x00007ffff7b8f313 in connect (sock=30, addr=0x55555d5d00c8, len=16) at src/libproxychains.c:392
392             ret = connect_proxy_chain(sock,

ma6174 avatar May 04 '17 04:05 ma6174

I build a docker image, you may pull the image and debug locally?

$ docker run -it ma6174/mongo_proxychains bash
root@61fa69e7a5a7:/# cd
root@61fa69e7a5a7:~# ls
data1  kneesocks      libkneesocks.so	  mongod_2.log	  proxychains.conf  ss_config.json  ss_server.log	   start_ss.sh
data2  kneesocks_cmd  libproxychains4.so  proxychains-ng  proxychains4	    ss_local.log    start_mongod_node2.sh
root@61fa69e7a5a7:~# bash start_ss.sh
root@61fa69e7a5a7:~# nohup: redirecting stderr to stdout
nohup: redirecting stderr to stdout

root@61fa69e7a5a7:~# bash start_mongod_node2.sh
root@61fa69e7a5a7:~# nohup: redirecting stderr to stdout

root@61fa69e7a5a7:~# ./proxychains4 -f proxychains.conf mongod --dbpath data1 --replSet test --port 1111
[proxychains] config file found: proxychains.conf
[proxychains] preloading ./libproxychains4.so
DEBUG:init_lib_wrapper called from gcc_init
DEBUG:pid[84]:at_init
DEBUG:pid[84]:wait_data
DEBUG:[play] socks5 127.0.0.1:2345
[proxychains] DLL init: proxychains-ng 4.12-git-1-gbb3df1e
DEBUG:loaded symbol 'connect' real addr 0x7f1219ee8cc0  wrapped addr 0x7f121b07d7ea
DEBUG:loaded symbol 'sendto' real addr 0x7f1219ee8fa0  wrapped addr 0x7f121b07e12f
DEBUG:loaded symbol 'gethostbyname' real addr 0x7f1219c28610  wrapped addr 0x7f121b07dc53
DEBUG:loaded symbol 'getaddrinfo' real addr 0x7f1219c04760  wrapped addr 0x7f121b07dcbf
DEBUG:loaded symbol 'freeaddrinfo' real addr 0x7f1219c04720  wrapped addr 0x7f121b07dd6c
DEBUG:loaded symbol 'gethostbyaddr' real addr 0x7f1219c28050  wrapped addr 0x7f121b07e01b
DEBUG:loaded symbol 'getnameinfo' real addr 0x7f1219c30380  wrapped addr 0x7f121b07a120
DEBUG:loaded symbol 'close' real addr 0x7f1219ee8ad0  wrapped addr 0x7f121b07d6e8
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] MongoDB starting : pid=84 port=1111 dbpath=data1 64-bit host=61fa69e7a5a7
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] db version v3.4.4
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] git version: 888390515874a9debd1b6c5d36559ca86b44babd
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] modules: none
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] build environment:
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten]     distmod: debian81
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-05-04T06:19:06.736+0000 I CONTROL  [initandlisten] options: { net: { port: 1111 }, replication: { replSet: "test" }, storage: { dbPath: "data1" } }
DEBUG:getaddrinfo: 0.0.0.0 1111
DEBUG:pid[84]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x5646bdc9ddc0
DEBUG:pid[84]:proxy_freeaddrinfo
2017-05-04T06:19:06.745+0000 I -        [initandlisten] Detected data files in data1 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-05-04T06:19:06.745+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=487M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-05-04T06:19:06.962+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2017-05-04T06:19:06.962+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 16 records totaling to 1526 bytes
2017-05-04T06:19:06.962+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2017-05-04T06:19:06.981+0000 I CONTROL  [initandlisten]
2017-05-04T06:19:06.981+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-05-04T06:19:06.981+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2017-05-04T06:19:06.981+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2017-05-04T06:19:06.981+0000 I CONTROL  [initandlisten]
2017-05-04T06:19:06.988+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'data1/diagnostic.data'
2017-05-04T06:19:06.991+0000 I NETWORK  [thread1] waiting for connections on port 1111
DEBUG:pid[84]:pc_getnameinfo
DEBUG:pid[84]:pc_getnameinfo
DEBUG:getaddrinfo: 127.0.0.1 1111
DEBUG:pid[84]:proxy_getaddrinfo
DEBUG:pid[84]:pc_getnameinfo
DEBUG:freeaddrinfo 0x5646c145d680
DEBUG:pid[84]:proxy_freeaddrinfo
DEBUG:getaddrinfo: 127.0.0.1 1112
DEBUG:pid[84]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x5646c145d680
DEBUG:pid[84]:proxy_freeaddrinfo
DEBUG:pid[84]:pc_getnameinfo
DEBUG:pid[84]:pc_getnameinfo
DEBUG:pid[84]:connect
DEBUG:target: 127.0.0.1
DEBUG:port: 1112
DEBUG:pid[84]:connect_proxy_chain
DEBUG:[play] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[84]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[84]:chain_step
 ...  127.0.0.1:1112 DEBUG:pid[84]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
2017-05-04T06:19:06.998+0000 I REPL     [replExecDBWorker-1] New replica set config in use: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "127.0.0.1:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "127.0.0.1:1112", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590abeb31026666da1a599f2') } }
2017-05-04T06:19:06.998+0000 I REPL     [replExecDBWorker-1] This node is 127.0.0.1:1111 in the config
2017-05-04T06:19:06.998+0000 I REPL     [replExecDBWorker-1] transition to STARTUP2
2017-05-04T06:19:06.998+0000 I REPL     [replExecDBWorker-1] Starting replication storage threads
2017-05-04T06:19:06.999+0000 I REPL     [replExecDBWorker-1] Starting replication fetcher thread
2017-05-04T06:19:06.999+0000 I REPL     [replExecDBWorker-1] Starting replication applier thread
2017-05-04T06:19:06.999+0000 I REPL     [replExecDBWorker-1] Starting replication reporter thread
2017-05-04T06:19:07.002+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 127.0.0.1:1112
2017-05-04T06:19:07.002+0000 I REPL     [rsSync] transition to RECOVERING
DEBUG:getaddrinfo: 127.0.0.1 1112
DEBUG:pid[84]:proxy_getaddrinfo
DEBUG:freeaddrinfo 0x5646c14ba640
DEBUG:pid[84]:proxy_freeaddrinfo
DEBUG:pid[84]:connect
DEBUG:target: 127.0.0.1
2017-05-04T06:19:07.005+0000 I REPL     [rsSync] transition to SECONDARY
DEBUG:port: 1112
DEBUG:pid[84]:connect_proxy_chain
DEBUG:[busy] socks5 127.0.0.1:2345
[proxychains] Random chain  ...  127.0.0.1:2345 DEBUG:pid[84]:timed_connect
DEBUG:
connect ret=-1
DEBUG:
poll ret=1
DEBUG:
value=0
DEBUG:pid[84]:chain_step
 ...  127.0.0.1:1112 DEBUG:pid[84]:tunnel_to
DEBUG:host dns <NULL>
 ...  OK
DEBUG:pid[84]:pc_getnameinfo
2017-05-04T06:19:08.340+0000 I NETWORK  [thread1] connection accepted from 127.0.0.1:45704 #1 (1 connection now open)
DEBUG:pid[84]:pc_getnameinfo
DEBUG:pid[84]:pc_getnameinfo
2017-05-04T06:19:08.341+0000 I NETWORK  [conn1] received client metadata from 127.0.0.1:45704 conn1: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 8 (jessie)"", architecture: "x86_64", version: "Kernel 4.9.21-moby" } }
DEBUG:pid[84]:pc_getnameinfo
2017-05-04T06:19:08.386+0000 I NETWORK  [thread1] connection accepted from 127.0.0.1:45706 #2 (2 connections now open)
DEBUG:pid[84]:pc_getnameinfo
DEBUG:pid[84]:pc_getnameinfo
2017-05-04T06:19:08.386+0000 I NETWORK  [conn2] received client metadata from 127.0.0.1:45706 conn2: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 8 (jessie)"", architecture: "x86_64", version: "Kernel 4.9.21-moby" } }
2017-05-04T06:19:17.003+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to 127.0.0.1:1112; ExceededTimeLimit: Couldn't get a connection within the time limit
2017-05-04T06:19:18.497+0000 I REPL     [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1)
2017-05-04T06:19:22.012+0000 I REPL     [rsBackgroundSync] waiting for 2 pings from other members before syncing
^C
root@61fa69e7a5a7:~#
root@61fa69e7a5a7:~# ./kneesocks_cmd mongod --dbpath data1 --replSet test --port 1111
[kneesocks] running
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] MongoDB starting : pid=162 port=1111 dbpath=data1 64-bit host=61fa69e7a5a7
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] db version v3.4.4
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] git version: 888390515874a9debd1b6c5d36559ca86b44babd
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] modules: none
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] build environment:
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten]     distmod: debian81
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-05-04T06:20:08.305+0000 I CONTROL  [initandlisten] options: { net: { port: 1111 }, replication: { replSet: "test" }, storage: { dbPath: "data1" } }
2017-05-04T06:20:08.305+0000 W -        [initandlisten] Detected unclean shutdown - data1/mongod.lock is not empty.
[kneesocks] getaddrinfo: node=0.0.0.0, service=1111
2017-05-04T06:20:08.310+0000 I -        [initandlisten] Detected data files in data1 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-05-04T06:20:08.310+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-05-04T06:20:08.310+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=487M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-05-04T06:20:08.499+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2017-05-04T06:20:08.499+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 16 records totaling to 1526 bytes
2017-05-04T06:20:08.499+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2017-05-04T06:20:08.513+0000 I CONTROL  [initandlisten]
2017-05-04T06:20:08.513+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-05-04T06:20:08.513+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2017-05-04T06:20:08.513+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2017-05-04T06:20:08.513+0000 I CONTROL  [initandlisten]
2017-05-04T06:20:08.518+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'data1/diagnostic.data'
[kneesocks] getaddrinfo: node=127.0.0.1, service=1111
2017-05-04T06:20:08.520+0000 I NETWORK  [thread1] waiting for connections on port 1111
[kneesocks] getaddrinfo: node=127.0.0.1, service=1112
[kneesocks] connect: type=stream, family=2, address=127.0.0.1, port=1112
2017-05-04T06:20:08.521+0000 I REPL     [replExecDBWorker-0] New replica set config in use: { _id: "test", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "127.0.0.1:1111", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "127.0.0.1:1112", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 2000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('590abeb31026666da1a599f2') } }
2017-05-04T06:20:08.521+0000 I REPL     [replExecDBWorker-0] This node is 127.0.0.1:1111 in the config
2017-05-04T06:20:08.521+0000 I REPL     [replExecDBWorker-0] transition to STARTUP2
2017-05-04T06:20:08.521+0000 I REPL     [replExecDBWorker-0] Starting replication storage threads
2017-05-04T06:20:08.522+0000 I REPL     [replExecDBWorker-0] Starting replication fetcher thread
2017-05-04T06:20:08.522+0000 I REPL     [replExecDBWorker-0] Starting replication applier thread
2017-05-04T06:20:08.522+0000 I REPL     [replExecDBWorker-0] Starting replication reporter thread
2017-05-04T06:20:08.523+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 127.0.0.1:1112
2017-05-04T06:20:08.523+0000 I REPL     [rsSync] transition to RECOVERING
[kneesocks] getaddrinfo: node=127.0.0.1, service=1112
[kneesocks] connect: type=stream, family=2, address=127.0.0.1, port=1112
2017-05-04T06:20:08.525+0000 I REPL     [rsSync] transition to SECONDARY
2017-05-04T06:20:08.526+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to 127.0.0.1:1112, took 3ms (1 connections now open to 127.0.0.1:1112)
2017-05-04T06:20:08.527+0000 I REPL     [ReplicationExecutor] Member 127.0.0.1:1112 is now in state SECONDARY
2017-05-04T06:20:13.416+0000 I NETWORK  [thread1] connection accepted from 127.0.0.1:45930 #1 (1 connection now open)
2017-05-04T06:20:13.417+0000 I NETWORK  [conn1] received client metadata from 127.0.0.1:45930 conn1: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 8 (jessie)"", architecture: "x86_64", version: "Kernel 4.9.21-moby" } }
2017-05-04T06:20:14.528+0000 I REPL     [rsBackgroundSync] sync source candidate: 127.0.0.1:1112
2017-05-04T06:20:14.528+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to 127.0.0.1:1112
[kneesocks] getaddrinfo: node=127.0.0.1, service=1112
[kneesocks] connect: type=stream, family=2, address=127.0.0.1, port=1112
2017-05-04T06:20:14.530+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to 127.0.0.1:1112, took 2ms (1 connections now open to 127.0.0.1:1112)
2017-05-04T06:20:14.532+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to 127.0.0.1:1112
[kneesocks] getaddrinfo: node=127.0.0.1, service=1112
[kneesocks] connect: type=stream, family=2, address=127.0.0.1, port=1112
2017-05-04T06:20:14.535+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to 127.0.0.1:1112, took 3ms (2 connections now open to 127.0.0.1:1112)
2017-05-04T06:20:14.537+0000 I REPL     [replication-0] Choosing new sync source because our current sync source, 127.0.0.1:1112, has an OpTime ({ ts: Timestamp 1493878750000|1, t: 9 }) which is not ahead of ours ({ ts: Timestamp 1493878750000|1, t: 9 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)
2017-05-04T06:20:14.537+0000 I REPL     [replication-0] Canceling oplog query due to OplogQueryMetadata. We have to choose a new sync source. Current source: 127.0.0.1:1112, OpTime { ts: Timestamp 1493878750000|1, t: 9 }, its sync source index:-1
2017-05-04T06:20:14.537+0000 W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source 127.0.0.1:1112 (config version: 2; last applied optime: { ts: Timestamp 1493878750000|1, t: 9 }; sync source index: -1; primary index: -1) is no longer valid
2017-05-04T06:20:14.538+0000 I REPL     [rsBackgroundSync] could not find member to sync from
2017-05-04T06:20:14.543+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to 127.0.0.1:1112: InvalidSyncSource: Sync source was cleared. Was 127.0.0.1:1112
2017-05-04T06:20:19.149+0000 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
2017-05-04T06:20:19.149+0000 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2017-05-04T06:20:19.150+0000 I REPL     [ReplicationExecutor] VoteRequester(term 9 dry run) received a yes vote from 127.0.0.1:1112; response message: { term: 9, voteGranted: true, reason: "", ok: 1.0 }
2017-05-04T06:20:19.150+0000 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2017-05-04T06:20:19.157+0000 I REPL     [ReplicationExecutor] VoteRequester(term 10) received a yes vote from 127.0.0.1:1112; response message: { term: 10, voteGranted: true, reason: "", ok: 1.0 }
2017-05-04T06:20:19.157+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 10
2017-05-04T06:20:19.157+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
2017-05-04T06:20:19.158+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 127.0.0.1:1112
[kneesocks] getaddrinfo: node=127.0.0.1, service=1112
[kneesocks] connect: type=stream, family=2, address=127.0.0.1, port=1112
2017-05-04T06:20:19.158+0000 I REPL     [ReplicationExecutor] My optime is most up-to-date, skipping catch-up and completing transition to primary.
2017-05-04T06:20:19.159+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to 127.0.0.1:1112, took 1ms (2 connections now open to 127.0.0.1:1112)
2017-05-04T06:20:20.539+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
2017-05-04T06:20:24.389+0000 I NETWORK  [thread1] connection accepted from 127.0.0.1:45974 #2 (2 connections now open)
2017-05-04T06:20:24.390+0000 I NETWORK  [conn2] received client metadata from 127.0.0.1:45974 conn2: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.4" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 8 (jessie)"", architecture: "x86_64", version: "Kernel 4.9.21-moby" } }
2017-05-04T06:20:24.393+0000 I NETWORK  [thread1] connection accepted from 127.0.0.1:45976 #3 (3 connections now open)
2017-05-04T06:20:24.394+0000 I NETWORK  [conn3] received client metadata from 127.0.0.1:45976 conn3: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.4" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 8 (jessie)"", architecture: "x86_64", version: "Kernel 4.9.21-moby" } }
^C2017-05-04T06:20:31.520+0000 I CONTROL  [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
2017-05-04T06:20:31.520+0000 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2017-05-04T06:20:31.520+0000 I NETWORK  [signalProcessingThread] closing listening socket: 6
2017-05-04T06:20:31.521+0000 I NETWORK  [signalProcessingThread] closing listening socket: 7
2017-05-04T06:20:31.521+0000 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-1111.sock
2017-05-04T06:20:31.521+0000 I NETWORK  [signalProcessingThread] shutdown: going to flush diaglog...
2017-05-04T06:20:31.521+0000 I REPL     [signalProcessingThread] shutting down replication subsystems
2017-05-04T06:20:31.521+0000 I REPL     [signalProcessingThread] Stopping replication reporter thread
2017-05-04T06:20:31.521+0000 I REPL     [signalProcessingThread] Stopping replication fetcher thread
2017-05-04T06:20:31.521+0000 I REPL     [signalProcessingThread] Stopping replication applier thread
2017-05-04T06:20:31.549+0000 I REPL     [signalProcessingThread] Stopping replication storage threads
2017-05-04T06:20:31.549+0000 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
2017-05-04T06:20:31.551+0000 I STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
2017-05-04T06:20:31.580+0000 I STORAGE  [signalProcessingThread] shutdown: removing fs lock...
2017-05-04T06:20:31.580+0000 I CONTROL  [signalProcessingThread] now exiting
2017-05-04T06:20:31.580+0000 I CONTROL  [signalProcessingThread] shutting down with code:0
2017-05-04T06:20:31.580+0000 I CONTROL  [initandlisten] shutting down with code:0
root@61fa69e7a5a7:~#

ma6174 avatar May 04 '17 06:05 ma6174

i don't have docker (it requires a go compiler), but i'll try to create a test program using nonblocking sockets. if we can't solve it this way, i can look into getting a go compiler bootstrapped... but i'd rather avoid it (i'm using my own custom linux distribution, sabotage linux).

rofl0r avatar May 04 '17 20:05 rofl0r

i've been able to reproduce the issue now, however it's not straightforward to fix. the problem is that proxychains creates its own socket to do the connection, than dup2()s it over the original fd, which causes the epoll loop not to get woken up. a possible solution is not to use a new socket, however that will probably cause issues when the app requests an AF_INET (i.e. ipv4) socket, and one or more of the proxies in the chain are ipv6 (AF_INET6 sockets) or vice versa . we could of course differentiate between blocking and nonblocking sockets and treat them differently, but i'm looking for a clean and general solution.

rofl0r avatar May 05 '17 05:05 rofl0r

I wonder if any further progress has ever been made on this? Thanks!

adrian-gierakowski avatar Jan 15 '24 15:01 adrian-gierakowski