fah-issues icon indicating copy to clipboard operation
fah-issues copied to clipboard

Workload never starts (FahCore returned: INTERRUPTED (102 = 0x66))

Open smiba opened this issue 5 years ago • 3 comments

Your issue may already be reported! Please search on the issue tracker before creating one.

Your Environment

  • F@H Software version: 7.6.21 and 7.4.4
  • Operating System: Debian 9
  • Browser: N/A (FAHControl)

Expected Behavior

The work queue progresses and the core gets started


Current Behavior

The core crashes(?) and never starts, the system keeps returning "FahCore returned: INTERRUPTED (102 = 0x66)" System doesn't drop the work and gets stuck on waiting for the Core to start, retrying this every minute.

09:26:11:WU01:FS00:Starting 09:26:11:WU01:FS00:Removing old file 'work/01/logfile_01-20201130-085702.txt' 09:26:11:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 01 -suffix 01 -version 706 -lifeline 14344 -checkpoint 15 -np 8 09:26:11:WU01:FS00:Started FahCore on PID 14414 09:26:11:WU01:FS00:Core PID:14418 09:26:11:WU01:FS00:FahCore 0xa8 started 09:26:12:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

It used to work though, but I think this may have been FahCore_a7 or a different work/project. I have two Linux machines that are stuck on this project and PRCG. (Project: 16926, PRCG: 16926(78, 786, 4) & PRCG: 16926(29, 636, 5)


Steps To Reproduce

  1. Install FAH, both version 7.4.4 or the latest 7.6.21 will do
  2. Have a CPU Folding slot
  3. At some point it will stop working, but I'm unsure if this is due to this specific workload or the use of a newer FahCore

Context

Due to this issue the system is now idle, "wasting" CPU cycles. Also at the moment the system partially heats my room, so I'm colder :)


smiba avatar Nov 30 '20 09:11 smiba

Possible duplicate of #1570 -- However my issue is about CPU folding

smiba avatar Nov 30 '20 09:11 smiba

Dropped the workload and received a new workload from project 16926 that is working without issues. Still on FahCore a8, no idea why the old workload had problems.. I think its likely it will come back at some point.

EDIT: As expected, got another workload with the same issue. PRCG 16926 (59, 832, 1)

smiba avatar Nov 30 '20 10:11 smiba

Hiya @smiba

Not sure what to make of it since your CPU does have 8 CPUs so in theory, the Project should work fine. Since it only happens on a single Project, I will ask around and see what happens 😄

PantherX avatar Dec 28 '20 01:12 PantherX