silentarmy icon indicating copy to clipboard operation
silentarmy copied to clipboard

sa-solver not relaunched after a crash ("Pipe closed by peer")

Open aleksfox opened this issue 8 years ago • 7 comments

After 1h of mining periodically there are messages "Pipe closed by peer" and then begins to decrease speed. In the end, one of the 2-card shows 0sol.

aleksfox avatar Nov 04 '16 17:11 aleksfox

I confirm. silentarmy is supposed to (but does not) restart a sa-solver process after it crashes. I will fix this.

But your second issue is why are the sa-solver processes crashing? Please try "sa-solver --nonce 1000000" and let it run long enough to see if it crashes (preferably run it on a GPU that is not used by "silentarmy")

mbevand avatar Nov 04 '16 17:11 mbevand

For first run i have: Segmentation fault (core dumped) after ~50 nonce Second run - same, but after ~200 nonce... Right now its runing about 15 min and i have only that messages sometimes:

default

I already had this problem (Segmentation fault (core dumped)) when i run you first (tests) solver 0.1 and when i try zogminer... but i have no problem like that when i use ZCash miner from eXtremal (based on silentrarmy 0.1\0.2)

Any suggestions?

ps: But, generally you miner have one of the best speed. Great work! And... It should be mentioned that ZCash miner from eXtremal work with his own pool without wardiff!

upd: sa-solver --nonce 1000000 stel working right now. i start 2 tests in 2 terminal windows. ./sa-solver --nonce 1000000 --use 0 ./sa-solver --nonce 1000000 --use 1

aleksfox avatar Nov 04 '16 17:11 aleksfox

Speed decrease...

2

and now speed ~71-72 sol and after that... pipe closed by peer or os.write(pipe, data) raised exception. And ... one of card = 0.00 sol

aleksfox avatar Nov 04 '16 18:11 aleksfox

Ubuntu sys log when crush

Nov  4 22:51:33 rig01 org.gnome.evolution.dataserver.Sources5[1252]: ** (evolution-source-registry:1631): WARNING **: secret_service_search_sync: must specify at least one attribute to match
Nov  4 22:52:32 rig01 kernel: [ 3701.810911] show_signal_msg: 33 callbacks suppressed
Nov  4 22:52:32 rig01 kernel: [ 3701.810916] sa-solver[1988]: segfault at 7fffe406f44f ip 0000000000403edb sp 00007fffcf8f82b0 error 4 in sa-solver[400000+12000]
Nov  4 22:53:28 rig01 kernel: [ 3757.355197] amdgpu 0000:01:00.0: GPU fault detected: 147 0x09328402
Nov  4 22:53:28 rig01 kernel: [ 3757.355203] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00903D26
Nov  4 22:53:28 rig01 kernel: [ 3757.355206] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B084002
Nov  4 22:53:28 rig01 kernel: [ 3757.355209] VM fault (0x02, vmid 5) at page 9452838, write from 'TC7' (0x54433700) (132)
Nov  4 22:53:36 rig01 kernel: [ 3765.094161] sa-solver[1989]: segfault at 7fff33768ee9 ip 0000000000403edb sp 00007fff1f017e90 error 4 in sa-solver[400000+12000]

and now i mining with ZCash miner from eXtremal (silentarmy v3) and have no problem... :( (1h-30min)

aleksfox avatar Nov 04 '16 19:11 aleksfox

So spontaneous segfaults, and a "GPU fault detected", hmm...

What GPU are you running it on? What driver version ("cat /sys/module/amdgpu/version")?

mbevand avatar Nov 06 '16 06:11 mbevand

I'm getting it too, though I haven't been around my computer when it happens. Happens what appears to be randomly, sometimes it works for 4 hours, sometimes 2 hours, occasionally 1, then crash. Had a case where it crashed 10 minutes after I left it, so wasted 6 hours + doing nothing..

MSI GTX 970, bios modded, ubuntu 14.04, cuda 8(SA is OCL, right?), driver 361.77

1x860Mb instance only.

Northbadge avatar Nov 15 '16 07:11 Northbadge

Cuda drivers include OCL drivers. It might not be the SA but OC or the custom bios. I would say, try reverting the mod and OC and check it out then.

Kubuxu avatar Nov 15 '16 17:11 Kubuxu