hnn-core
hnn-core copied to clipboard
CIs for MSMPI
closes #589
It looks like we can inherit the github action from mpi4py :)
Codecov Report
Merging #590 (ef536c5) into master (2e53827) will decrease coverage by
0.08%. The diff coverage isn/a.
:exclamation: Current head ef536c5 differs from pull request most recent head f41bba4. Consider uploading reports for the commit f41bba4 to get more accurate results
@@ Coverage Diff @@
## master #590 +/- ##
==========================================
- Coverage 92.00% 91.93% -0.08%
==========================================
Files 22 22
Lines 4253 4253
==========================================
- Hits 3913 3910 -3
- Misses 340 343 +3
| Impacted Files | Coverage Δ | |
|---|---|---|
| hnn_core/parallel_backends.py | 82.22% <0.00%> (-0.84%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
It's pretty hard for me to debug any further without a Windows machine. @rythorpe feel free to take over. But if you feel it's too involved, we can move on and work on it some time after the release as well.
Ryan reminded me of this, I then ran it in my local fork and it works. So I guess it might be a cache problem? That's the only difference I can think of now. @jasmainak could you try to update the cache key to check?
I'll try to get around to testing this out later this week.
I've tried debugging this for a few hours on Windows 10 without much luck. I'm pretty sure this is some sort of unicode encoding/decoding error on the child process's side, however, it is extremely tricky to trace given that each mpi_child process is actually a child process within a child process (i.e., Popen starts a master subprocess that performs a system call to mpiexec, which in turn instantiates multiple parallel MPI processes). It's hard to know how and where a raw string pathname is passed to the system where backslashes get interpreted as escaped characters, and I suspect that one particular pathname (mpi_child.py) gets re-interpreted on multiple occasions throughout it's trajectory before finally getting called by the python interpreter.
On the bright side, I can confirm that MSMPI get's installed with NEURON on Windows and passes our parallel backend installation test (with removal of the backslashes of course).
Any thoughts @jasmainak?
Does this work or help: https://github.com/jonescompneurolab/hnn-core/pull/506 since there is one less nesting level to worry about?
The way I would debug this is by peppering the mpi_child.py code with dumb print (e.g., print('hi1'), print('hi2') etc) statements ... then you can figure out where exactly the failure occurs ...
If this is too much of a bother, I would say that for 0.3 release, we should prioritize:
- The GUI ... did you figure out the issue with the titles?
- Release Windows without MPI support for now since it is more for cluster set ups which will likely be running Linux
Does this work or help: #506 since there is one less nesting level to worry about?
That's a good idea. I'll try it out.
The way I would debug this is by peppering the
mpi_child.pycode with dumbprint('hi1'),print('hi2')etc) statements ... then you can figure out where exactly the failure occurs ...
That's pretty much what I've done, except that the error occurs before mpi_child.py gets called.
If this is too much of a bother, I would say that for 0.3 release, we should prioritize:
- The GUI ... did you figure out the issue with the titles?
You mean when running it on Windows? If so, it was just a weird dependency problem that I resolved by creating a new conda environment.
- Release Windows without MPI support for now since it is more for cluster set ups which will likely be running Linux
This doesn't make much sense to me since the vast majority of Windows users are new users or workshop participants who will need to run single-trial simulations in a reasonable amount of time (e.g., when running the tutorials synchronously).