silentarmy
silentarmy copied to clipboard
sa-solver not relaunched after a crash ("Pipe closed by peer")
After 1h of mining periodically there are messages "Pipe closed by peer" and then begins to decrease speed. In the end, one of the 2-card shows 0sol.
I confirm. silentarmy is supposed to (but does not) restart a sa-solver process after it crashes. I will fix this.
But your second issue is why are the sa-solver processes crashing? Please try "sa-solver --nonce 1000000" and let it run long enough to see if it crashes (preferably run it on a GPU that is not used by "silentarmy")
For first run i have: Segmentation fault (core dumped) after ~50 nonce Second run - same, but after ~200 nonce... Right now its runing about 15 min and i have only that messages sometimes:
I already had this problem (Segmentation fault (core dumped)) when i run you first (tests) solver 0.1 and when i try zogminer... but i have no problem like that when i use ZCash miner from eXtremal (based on silentrarmy 0.1\0.2)
Any suggestions?
ps: But, generally you miner have one of the best speed. Great work! And... It should be mentioned that ZCash miner from eXtremal work with his own pool without wardiff!
upd: sa-solver --nonce 1000000 stel working right now. i start 2 tests in 2 terminal windows. ./sa-solver --nonce 1000000 --use 0 ./sa-solver --nonce 1000000 --use 1
Speed decrease...
and now speed ~71-72 sol and after that... pipe closed by peer or os.write(pipe, data) raised exception. And ... one of card = 0.00 sol
Ubuntu sys log when crush
Nov 4 22:51:33 rig01 org.gnome.evolution.dataserver.Sources5[1252]: ** (evolution-source-registry:1631): WARNING **: secret_service_search_sync: must specify at least one attribute to match
Nov 4 22:52:32 rig01 kernel: [ 3701.810911] show_signal_msg: 33 callbacks suppressed
Nov 4 22:52:32 rig01 kernel: [ 3701.810916] sa-solver[1988]: segfault at 7fffe406f44f ip 0000000000403edb sp 00007fffcf8f82b0 error 4 in sa-solver[400000+12000]
Nov 4 22:53:28 rig01 kernel: [ 3757.355197] amdgpu 0000:01:00.0: GPU fault detected: 147 0x09328402
Nov 4 22:53:28 rig01 kernel: [ 3757.355203] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00903D26
Nov 4 22:53:28 rig01 kernel: [ 3757.355206] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B084002
Nov 4 22:53:28 rig01 kernel: [ 3757.355209] VM fault (0x02, vmid 5) at page 9452838, write from 'TC7' (0x54433700) (132)
Nov 4 22:53:36 rig01 kernel: [ 3765.094161] sa-solver[1989]: segfault at 7fff33768ee9 ip 0000000000403edb sp 00007fff1f017e90 error 4 in sa-solver[400000+12000]
and now i mining with ZCash miner from eXtremal (silentarmy v3) and have no problem... :( (1h-30min)
So spontaneous segfaults, and a "GPU fault detected", hmm...
What GPU are you running it on? What driver version ("cat /sys/module/amdgpu/version")?
I'm getting it too, though I haven't been around my computer when it happens. Happens what appears to be randomly, sometimes it works for 4 hours, sometimes 2 hours, occasionally 1, then crash. Had a case where it crashed 10 minutes after I left it, so wasted 6 hours + doing nothing..
MSI GTX 970, bios modded, ubuntu 14.04, cuda 8(SA is OCL, right?), driver 361.77
1x860Mb instance only.
Cuda drivers include OCL drivers. It might not be the SA but OC or the custom bios. I would say, try reverting the mod and OC and check it out then.