M2
M2 copied to clipboard
Parallel threads, and in particular canceling tasks, creates instability
loadPackage "FastLinAlg";
T = ZZ/101[x1,x2,x3,x4,x5,x6,x7];
I = ideal(x5*x6-x4*x7,x1*x6-x2*x7,x5^2-x1*x7,x4*x5-x2*x7,x4^2-x2*x6,x1*x4-x2*x5,
x2*x3^3*x5+3*x2*x3^2*x7+8*x2^2*x5+3*x3*x4*x7-8*x4*x7+x6*x7,x1*x3^3*x5+3*x1*x3^2*x7
+8*x1*x2*x5+3*x3*x5*x7-8*x5*x7+x7^2,x2*x3^3*x4+3*x2*x3^2*x6+8*x2^2*x4+3*x3*x4*x6
-8*x4*x6+x6^2,x2^2*x3^3+3*x2*x3^2*x4+8*x2^3+3*x2*x3*x6-8*x2*x6+x4*x6,x1*x2*x3^3
+3*x2*x3^2*x5+8*x1*x2^2+3*x2*x3*x7-8*x2*x7+x4*x7,x1^2*x3^3+3*x1*x3^2*x5+8*x1^2*x2
+3*x1*x3*x7-8*x1*x7+x5*x7);
M = jacobian I;
allowableThreads = 8;
Now run the next line over and over again until you get crashes or weird behavior (such as errors on creation of new threads, or threads saying they are completed when they aren't).
isRankAtLeast(3, M, Threads=>4, Verbose=>true) --Run this over an over until you get crashes
The backtrace is caused by a call of cancelTask on a task that just runs rank M.
What it's doing is trying to compute rank in two different ways, and cancelling a task if the other one gets done first.
I think the problem is that if you try to cancel a task that has finished, then you are still responsible for getting its result, or it will continue to occupy space on the queue. Try using this function for cancellation:
cancel = task -> (
<< "cancelling task " << task << endl;
cancelTask task;
while true do (
if isCanceled task then (<< "cancelled task terminated " << task << endl; break);
if isReady task then (taskResult task ; << "cancelled task finished " << task << endl; break);
sleep 1;
);
<< "cancelled task " << task << endl;
)
A related problem is that your task often return a list of lists of numbers instead of a number, namely {{0, 1, 3}, {2, 3, 0}}.
And here is better code for waiting for one of two tasks to finish:
while not isReady t1 and not isReady t3 do nanosleep 10000000;
result := if isReady t1 then (cancel t3; taskResult t1) else if isReady t3 then (cancel t1; taskResult t3) else error "oops";
But it should be rewritten to apply to a list of tasks, not just 2.
@DanGrayson is it possible to have a function that will cancel a task, but not cause an error/backtrace?
Hi Dan,
It's returning what it is supposed to (a list of numbers).
In this particular example, the task result tr is either:
- a list of numbers (corresponding to a submatrix of that rank), or
- null (which means the function failed to find such a submatrix).
In this example, rank (or t3) takes a longer than t1. I don't actually remember if it ever finishes, but it certainly takes several minutes. So I don't think there's much chance that it finishes before it is canceled.
By the way, you are right, we should apply that to a list of tasks. That was what we initially implemented. However, it was easier to debug in this logic.
Hi Dan, It's returning what it is supposed to (a list of numbers). In this particular example, the task result
tris either:
- a list of numbers (corresponding to a submatrix of that rank), or
- null (which means the function failed to find such a submatrix).
In this example,
rank(ort3) takes a longer thant1. I don't actually remember if it ever finishes, but it certainly takes several minutes. So I don't think there's much chance that it finishes before it is canceled.
Okay, but then it fails here:
tr3 := taskResult(t3);
return (tr3 >= n1);
because n1 is a number!
@DanGrayson is it possible to have a function that will cancel a task, but not cause an error/backtrace?
Setting backtrace=false (in the running task) will at least hush the backtrace.
It seems that debuggingMode is false, or else the task would have gone into the debugger and offered a prompt. It might be worth verifying that.
It would be worthwhile designing a mechanism for silent interruptions.
Maybe we should have a new top level variable called "silentInterruptions", so that setting it to true would prevent the error display. Then the code in "evaluate.d" could be modified appropriately. Authors of task code could optionally set the variable to true.
Okay, but then it fails here:
tr3 := taskResult(t3);
return (tr3 >= n1);
because n1 is a number!
But
t3 := createTask(rank, M0);
So if its finished, it should be a number. Right?
The only thing I can think of is that when you run cancel threads multiple times, Macaulay2 loses track of which thread is which?
(By the way, occasionally and eventually, I actually can get M2 to crash (segmentation fault), by rerunning things).
Oh, I see what happened: I didn't understand that your two tasks don't return the same type of answer when they complete! It's probably best to arrange that, so when you get an answer from one of the tasks, you don't need to know which one gave it.
So where do things stand now with this example?
@kschwede -- have you ever had the task return null when it should have returned a list?
@mikestillman -- when I run it 8 times, the first task returns null.
I've isolated the problem that I observed. I don't know whether this is what Karl observed until he answers my question. Try this:
allowableThreads = 8
cancel = task -> (
<< "cancelling task " << task << endl;
cancelTask task;
while true do (
if isCanceled task then (<< "cancelled task terminated " << task << endl; break);
if isReady task then (taskResult task ; << "cancelled task finished " << task << endl; break);
nanosleep 10000000;
))
for i to 120 do (
<< endl << "-- " << i << endl;
t := createTask ((() -> 333), ());
u := createTask((() -> while true do 1), ());
schedule t;
schedule u;
while not isReady t do nanosleep 10000000;
cancel u;
result := taskResult t;
if result === null then error "t returned null";
assert (result === 333);
<< "task returned 333" << endl;
)
Typical output:
Macaulay2, version 1.15.0.1
with packages: ConwayPolynomials, Elimination, IntegralClosure, InverseSystems, LLLBases, PrimaryDecomposition, ReesAlgebra, TangentCone, Truncations
i1 : load "~/src/M2/M2/task-bug-3.m2"
-- 0
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 1
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 2
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 3
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 4
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 5
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 6
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 7
cancelling task <<task, running>>
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelled task terminated <<task, canceled>>
task returned 333
-- 8
../../../../task-bug-3.m2:15:35:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelling task <<task, result available, task done>>
cancelled task finished <<task, result delivered, task terminated>>
task returned 333
-- 9
../../../../task-bug-3.m2:14:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:14:30:(3): --back trace--
../../../../task-bug-3.m2:15:29:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:29:(3): --back trace--
cancelling task <<task, result available, task done>>
cancelled task finished <<task, result delivered, task terminated>>
../../../../task-bug-3.m2:21:30:(3):[4]: error: t returned null
../../../../task-bug-3.m2:21:30:(3):[4]: --entering debugger (type help to see debugger commands)
../../../../task-bug-3.m2:21:30-21:30: --source code:
if result === null then error "t returned null";
Here's a new version of the bug:
allowableThreads = 8
cancel = task -> (
<< "cancelling task " << task << endl;
cancelTask task;
while true do (
if isCanceled task then (<< "cancelled task terminated " << task << endl; break);
if isReady task then (taskResult task ; << "cancelled task finished " << task << endl; break);
nanosleep 10000000;
))
anomalies = 0
for i to 20 do (
<< endl << "-- " << i << endl;
t := createTask ((() -> 333), ());
u := createTask((() -> (
<< "starting loop" << endl ;
elapsedTime while true do 1; -- elapsedTime prints even after an interrupt
<< "ending loop" << endl ; -- should never get here
)),
());
schedule t;
schedule u;
while not isReady t do nanosleep 10000000;
nanosleep 10000000;
if isReady u then (
anomalies = anomalies + 1;
<< "------- u finished, why???? value = " << toExternalString (taskResult u) << endl;
-- cancel u; -- cancelling u even if it is ready can crash M2, try it
)
else cancel u;
result := taskResult t;
if result === null then (
anomalies = anomalies + 1;
<< "------- t returned null why????" << endl;
);
<< "t returns " << result << endl;
)
<< "encountered " << anomalies << " anomalies" << endl
Typical output:
+ ./M2 --no-readline -q --print-width 168
Macaulay2, version 1.15.0.1
with packages: ConwayPolynomials, Elimination, IntegralClosure, InverseSystems, LLLBases, PrimaryDecomposition, ReesAlgebra, TangentCone, Truncations
i1 : load "~/src/M2/M2/task-bug-3.m2"
-- 0
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:47:(3):[1]: error: interrupted
-- 0.0181107 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 1
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0215245 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 2
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:47:(3):[1]: error: interrupted
-- 0.0203783 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 3
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
starting loop
starting loop
../../../../task-bug-3.m2:13:30:(3): --back trace--
k <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0209995 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
task returned 333
-- 4
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0217624 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 5
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0401997 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 6
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0212575 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 7
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
starting loop
starting loop
../../../../task-bug-3.m2:13:30:(3): --back trace--
k <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0241421 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
task returned 333
-- 8
starting loop
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0216492 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
t returns 333
task returned 333
-- 9
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
../../../../task-bug-3.m2:15:48:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
------- t returned null why????
t returns null
task returned 333
-- 10
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
starting loop
starting loop
../../../../task-bug-3.m2:13:30:(3): --back trace--
k <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0202148 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
task returned 333
-- 11
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:47:(3):[1]: error: interrupted
-- 0.020267 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 12
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0384809 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 13
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0238643 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 14
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
../../../../task-bug-3.m2:15:48:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
------- t returned null why????
t returns null
task returned 333
-- 15
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0218819 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
task returned 333
-- 16
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
../../../../task-bug-3.m2:15:48:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
------- t returned null why????
t returns null
task returned 333
-- 17
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0386004 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 18
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0202727 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 19
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0202554 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
-- 20
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0381751 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
task returned 333
encountered 11 anomalies
i2 : restart
Macaulay2, version 1.15.0.1
with packages: ConwayPolynomials, Elimination, IntegralClosure, InverseSystems, LLLBases, PrimaryDecomposition, ReesAlgebra, TangentCone, Truncations
i1 : load "~/src/M2/M2/task-bug-3.m2"
-- 0
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0180763 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 1
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0229249 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 2
starting loop
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0237698 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
t returns 333
-- 3
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0218536 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 4
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.020255 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 5
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0202537 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 6
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
starting loop
starting loop
../../../../task-bug-3.m2:13:30:(3): --back trace--
k <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0371289 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
-- 7
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0219568 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 8
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0449853 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 9
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0204956 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 10
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
../../../../task-bug-3.m2:15:48:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
------- t returned null why????
t returns null
-- 11
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
starting loop
starting loop
../../../../task-bug-3.m2:13:30:(3): --back trace--
k <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0389954 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
-- 12
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0209659 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 13
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0208666 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 14
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0386303 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 15
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
../../../../task-bug-3.m2:15:48:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
------- t returned null why????
t returns null
-- 16
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
starting loop
starting loop
../../../../task-bug-3.m2:13:30:(3): --back trace--
k <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0265399 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
------- t returned null why????
t returns null
-- 17
../../../../task-bug-3.m2:13:30:(3):[1]: error: interrupted
../../../../task-bug-3.m2:13:30:(3): --back trace--
../../../../task-bug-3.m2:15:48:(3):[1]: error: interrupted
../../../../task-bug-3.m2:15:48:(3): --back trace--
------- u finished, why???? value = null
------- t returned null why????
t returns null
-- 18
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0367605 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 19
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.023231 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
-- 20
starting loop
cancelling task <<task, running>>
../../../../task-bug-3.m2:16:39:(3):[1]: error: interrupted
-- 0.0422598 seconds elapsed
../../../../task-bug-3.m2:15:48:(3): --back trace--
cancelled task terminated <<task, canceled>>
t returns 333
encountered 10 anomalies
i2 :
null means that the first task failed (the getSubMatrixOfRank). However, with the way its being called, I don't think it can fail. The option MaxMinors=>infinity should stop that call from giving up ever.
We have the threads giving different things because we were trying to debug what was going on.
I just ran it 1000 times (the single threaded version, which doesn't play with threads, it never failed on this example). Heck, even running 1000 times without that infinity limit also works.
I'll try your code suggestion after the FoTR seminar.
Yes, it's probably irrelevant that your code could theoretically return null, since the code I found doesn't use your code. But I was wondering whether you ever isolated the source of instability as being due to a null value getting returned.
We're grateful to you for locating this bug!
I'll try to fix it eventually, so I've assigned myself.
I'm not sure if this is the same bug, but I've seen this happen a couple times now on builds of the Debian package when generating the example code for cancelTask:
--making example results for cancelTask(Task) in file /build/macaulay2-1.14.0.1+git835.718c3dd+ds/M2/usr-dist/common/share/doc/Macaulay2/Macaulay2Doc/example-output/_cancel__Task_lp__Task_rp.out
ulimit -c unlimited; ulimit -t 700; ulimit -m 850000; ulimit -s 8192; ulimit -n 512; cd /tmp/M2-17966-0/617-rundir/; GC_MAXIMUM_HEAP_SIZE=400M /build/macaulay2-1.14.0.1+git835.718c3dd+ds/M2/usr-dist/x86_64-Linux-Debian-unknown/bin/M2-binary --silent --print-width 77 --stop --int --no-readline -q --no-randomize <"/tmp/M2-17966-0/47_cancel__Task_lp__Task_rp.m2" >>"/build/macaulay2-1.14.0.1+git835.718c3dd+ds/M2/usr-dist/common/share/doc/Macaulay2/Macaulay2Doc/example-output/_cancel__Task_lp__Task_rp.errors" 2>&1
/build/macaulay2-1.14.0.1+git835.718c3dd+ds/M2/usr-dist/common/share/doc/Macaulay2/Macaulay2Doc/example-output/_cancel__Task_lp__Task_rp.errors:0:1: (output file) error: Macaulay2 exited with status code 1
/tmp/M2-17966-0/47_cancel__Task_lp__Task_rp.m2:0:1: (input file)
M2: *** Error 1
root@gloria:/build/macaulay2-1.14.0.1+git835.718c3dd+ds/M2/usr-dist/common/share/doc/Macaulay2/Macaulay2Doc/example-output# cat _cancel__Task_lp__Task_rp.errors
-- -*- M2-comint -*- hash: -679331934
i1 : n = 0
o1 = 0
i2 : t = schedule(() -> while true do n = n+1)
o2 = <<task, created>>
o2 : Task
i3 : sleep 1
o3 = 0
i4 : t
o4 = <<task, running>>
o4 : Task
i5 : n
o5 = 16722
i6 : sleep 1
o6 = 0
i7 : t
o7 = <<task, running>>
o7 : Task
i8 : n
o8 = 42779
i9 : isReady t
o9 = false
i10 : cancelTask t
i11 : sleep 1
i11 : stdio:2:26:(3):[1]: error: interrupted
stdio.d:563:35: error: array index -7 out of bounds 0 .. 4095
Thanks!
By the way, has there been any progress on this? I just compiled a new version off the master branch, and I still had instability. Should it work on the development branch? By the way, if I don't set allowableThreads, it never tries executing both simultaneously, and so while it doesn't crash, it also provides no speed up.
The code causing crashes in the current example is essentially this (I can provide the full code if it's of any use). I am using the cancel command Dan wrote above.
t1 := createTask(dimViaBezoutNonhomogeneous, (I1, Verbose=>opts.Verbose, DimensionIntersectionAttempts=>attempts));
t2 := createTask(dim, (I1));
schedule t1;
schedule t2;
r1 := isReady(t1);
r2 := isReady(t2);
if opts.Verbose then print ("dimViaBezout: starting threads, one classical dim, one probabilistic dim ");
while (r1==false and r2==false) do ( sleep(1); r1 = isReady(t1); r2 = isReady(t2); );
if (r2) then (
if opts.Verbose then print ("dimViaBezout: classical dim finished first.");
tr = taskResult(t2);
cancel t1;
)
else if (r1) then (
if opts.Verbose then print ("dimViaBezout: probabilistic dim finished first.");
tr = taskResult(t1);
cancel t2;
);
return tr;
While it does work a few times, after several executions I tend to get this error.
/usr/share/Macaulay2/Core/option.m2:16:8:(1):[1]: error: interrupted
/usr/share/Macaulay2/Core/option.m2:16:8:(1): --back trace--
internal error: beginning of line marker not within buffer
Aborted
Process M2 exited abnormally with code 134
Finally, I'm still not sure how to suppress backtrace messages.
Creating tasks by doing things like:
t2 := createTask(myI -> (backtrace=false; return dim myI), (I1));
does not help as far as I can tell.
I haven't done any work on this in a long time. Mahrud was going to start looking into improving operations of threads for us, and gave a presentation about it.
It would be nice if someone would figure this out. I've just unassigned myself from the issue.
@mahrud I'd be happy to help on this any way I can.