parallel
parallel copied to clipboard
pll_id conflicts when submitting many jobs simultaneously (LSF arrays)
Preliminaries
Before submitting an issue, please check (with x
in brackets) that you:
- [x] Are using the newest release (see here for latest release version number).
- [x] Have checked that the examples in the help work.
- [x] Have read the help (HTML version) and the gallery of examples.
- [x] Have checked that there is not already an existing issues for what you are reporting.
Expected behavior and actual behavior
Context
I am using LSF as a job scheduler to submit an array of jobs to our cluster. Each job is assigned 4 cores, and I am using parallel sim
to divide the simulation among the 4 cores.
Desired behavior
I would like parallel
to assign a unique pll_id
to each job.
Actual behavior
LSF may send many jobs simultaneously, i.e., within the same second. This means that parallel
assigns the same pll_id
to each job, causing a conflict and errors for all but one of the jobs that arrive within the same second.
Failed solution attempted
Because each job has a unique seed
, I tried the randtype("current")
option but this was not effective.
Solution (workaround)
What solved the problem was to create a while
loop such that, if cap parallel sim
returns an error, we wait a random number of seconds (1-16, although this is arbitrary) and try again. This was successful although somewhat to my surprise some jobs needed to go through the loop 10 or more times. I have appended a sketch of the code.
Steps to reproduce the problem
This would be pretty tough because I think it depends on the specifics of our cluster, our job scheduler, etc.
System information
Some relevant information
- Stata version and flavor: Stata 18.5 MP4
- OS type and version (e.g. Windows 10): RHEL9
- Parallel version: version 1.20.1 07jun2021
Output from creturn list
:
System values
-------------
------------------------------------------------------
c(current_date) = "25 Oct 2024"
c(current_time) = "09:09:16"
c(rmsg_time) = 0 (seconds, from set rmsg)
------------------------------------------------------
c(stata_version) = 18.5
c(version) = 17 (version)
c(userversion) = 17 (version)
c(dyndoc_version) = 2 (dyndoc)
------------------------------------------------------
c(born_date) = "16 Jul 2024"
c(edition) = "BE"
c(edition_real) = "MP"
c(bit) = 64
c(SE) = 1
c(MP) = 1
c(processors) = 4 (Stata/MP, set processors)
c(processors_lic) = 4
c(processors_mach) = 32
c(processors_max) = 4
c(kmp_blocktime) = 200 (set kmp_blocktime)
c(mode) = "batch"
c(console) = "console"
------------------------------------------------------
c(os) = "Unix"
c(osdtl) = ""
c(hostname) = "c009n01"
c(machine_type) = "PC (64-bit x86-64)"
c(byteorder) = "lohi"
c(username) = "rpguiter"
------------------------------------------------------
Directories and paths
---------------------
------------------------------------------------------
c(sysdir_stata) = "/usr/local/apps/s.." (sysdir)
c(sysdir_base) = "/usr/local/apps/s.." (sysdir)
c(sysdir_site) = "/usr/local/apps/s.." (sysdir)
c(sysdir_plus) = "code/ado/plus/" (sysdir)
c(sysdir_personal) = "code/ado/personal/" (sysdir)
c(sysdir_oldplace) = "code/ado/" (sysdir)
c(tmpdir) = "/share/rpguiter/r.."
------------------------------------------------------
c(adopath) = "BASE;SITE;.;PERSO.." (adopath)
c(pwd) = "/rs1/researchers/.." (cd)
c(dirsep) = "/"
------------------------------------------------------
System limits
-------------
------------------------------------------------------
c(max_N_theory) = 1099511627775
c(max_k_theory) = 5000 (set maxvar)
c(max_width_theory) = 1048576 (set maxvar)
------------------------------------------------------
c(max_matdim) = 65534
------------------------------------------------------
c(max_it_cvars) = 64
c(max_it_fvars) = 8
------------------------------------------------------
c(max_macrolen) = 15480200
c(macrolen) = 645200 (set maxvar)
c(charlen) = 67783
c(max_cmdlen) = 15480216
c(cmdlen) = 645216 (set maxvar)
c(namelenbyte) = 128
c(namelenchar) = 32
c(eqlen) = 1337
------------------------------------------------------
Numerical and string limits
---------------------------
------------------------------------------------------
c(mindouble) = -8.9884656743e+307
c(maxdouble) = 8.9884656743e+307
c(epsdouble) = 2.22044604925e-16
c(smallestdouble) = 2.2250738585e-308
------------------------------------------------------
c(minfloat) = -1.70141173319e+38
c(maxfloat) = 1.70141173319e+38
c(epsfloat) = 1.19209289551e-07
------------------------------------------------------
c(minlong) = -2147483647
c(maxlong) = 2147483620
------------------------------------------------------
c(minint) = -32767
c(maxint) = 32740
------------------------------------------------------
c(minbyte) = -127
c(maxbyte) = 100
------------------------------------------------------
c(maxstrvarlen) = 2045
c(maxstrlvarlen) = 2000000000
c(maxvlabellen) = 32000
------------------------------------------------------
Current dataset
---------------
------------------------------------------------------
c(frame) = "default"
c(N) = 0
c(k) = 0
c(width) = 0
c(changed) = 0
c(filename) = ""
c(filedate) = ""
------------------------------------------------------
Memory settings
---------------
------------------------------------------------------
c(memory) = 33554432
c(maxvar) = 5000 (set maxvar)
c(niceness) = 5 (set niceness)
c(min_memory) = 0 (set min_memory)
c(max_memory) = . (set max_memory)
c(segmentsize) = 33554432 (set segmentsize)
c(adosize) = 1000 (set adosize)
c(max_preservemem) = 1073741824 (set max_preservemem)
------------------------------------------------------
Output settings
---------------
------------------------------------------------------
c(more) = "off" (set more)
c(rmsg) = "off" (set rmsg)
c(dp) = "period" (set dp)
c(linesize) = 110 (set linesize)
c(pagesize) = 23 (set pagesize)
c(logtype) = "smcl" (set logtype)
c(logmsg) = "on" (set logmsg)
c(noisily) = 1
------------------------------------------------------
c(iterlog) = "on" (set iterlog)
------------------------------------------------------
c(level) = 95 (set level)
c(clevel) = 95 (set clevel)
------------------------------------------------------
c(showbaselevels) = "" (set showbaselevels)
c(showemptycells) = "" (set showemptycells)
c(showomitted) = "" (set showomitted)
c(fvlabel) = "on" (set fvlabel)
c(fvwrap) = 1 (set fvwrap)
c(fvwrapon) = "word" (set fvwrapon)
c(lstretch) = "" (set lstretch)
------------------------------------------------------
c(cformat) = "" (set cformat)
c(sformat) = "" (set sformat)
c(pformat) = "" (set pformat)
------------------------------------------------------
c(coeftabresults) = "on" (set coeftabresults)
c(dots) = "on" (set dots)
------------------------------------------------------
c(collect_label) = "default" (set collect_label)
c(collect_style) = "default" (set collect_style)
c(table_style) = "table" (set table_style)
c(etable_style) = "etable" (set etable_style)
c(dtable_style) = "dtable" (set dtable_style)
c(collect_warn) = "on" (set collect_warn)
------------------------------------------------------
Interface settings
------------------
------------------------------------------------------
c(linegap) = . (set linegap)
c(scrollbufsize) = . (set scrollbufsize)
c(maxdb) = 50 (set maxdb)
------------------------------------------------------
Graphics settings
-----------------
------------------------------------------------------
c(graphics) = "off" (set graphics)
c(scheme) = "s1color" (set scheme)
c(printcolor) = "asis" (set printcolor)
c(min_graphsize) = 1 (region_options)
c(max_graphsize) = 100 (region_options)
------------------------------------------------------
Network settings
----------------
------------------------------------------------------
c(httpproxy) = "off" (set httpproxy)
c(httpproxyhost) = "" (set httpproxyhost)
c(httpproxyport) = 80 (set httpproxyport)
------------------------------------------------------
c(httpproxyauth) = "off" (set httpproxyauth)
c(httpproxyuser) = "" (set httpproxyuser)
c(httpproxypw) = "" (set httpproxypw)
------------------------------------------------------
Trace (program debugging) settings
----------------------------------
------------------------------------------------------
c(trace) = "off" (set trace)
c(tracedepth) = 1 (set tracedepth)
c(tracesep) = "on" (set tracesep)
c(traceindent) = "on" (set traceindent)
c(traceexpand) = "on" (set traceexpand)
c(tracenumber) = "off" (set tracenumber)
c(tracehilite) = "" (set tracehilite)
------------------------------------------------------
Mata settings
-------------
------------------------------------------------------
c(matastrict) = "off" (set matastrict)
c(matalnum) = "off" (set matalnum)
c(mataoptimize) = "on" (set mataoptimize)
c(matafavor) = "space" (set matafavor)
c(matacache) = 2000 (set matacache)
c(matalibs) = "" (set matalibs)
c(matamofirst) = "off" (set matamofirst)
c(matasolvetol) = . (set matasolvetol)
------------------------------------------------------
Java settings
-------------
------------------------------------------------------
c(java_heapmax) = "4096m" (set java_heapmax)
c(java_home) = "/usr/local/apps/s.." (set java_home)
------------------------------------------------------
LAPACK settings
---------------
------------------------------------------------------
c(lapack_mkl) = "on" (set lapack_mkl)
c(lapack_mkl_cnr) = "default" (set lapack_mkl_cnr)
------------------------------------------------------
putdocx settings
----------------
------------------------------------------------------
c(docx_hardbreak) = "off" (set docx_hardbreak)
c(docx_paramode) = "off" (set docx_paramode)
c(docx_maxtable) = 500 (set docx_maxtable)
------------------------------------------------------
putpdf settings
---------------
------------------------------------------------------
c(pdf_maxtable) = 500 (set pdf_maxtable)
------------------------------------------------------
Python settings
---------------
------------------------------------------------------
c(python_exec) = "" (set python_exec)
c(python_userpath) = "" (set python_userpath)
------------------------------------------------------
RNG settings
------------
------------------------------------------------------
c(rng) = "default" (set rng)
c(rng_current) = "mt64"
c(rngstate) = "XAA00000000000000.." (set rngstate)
c(rngseed_mt64s) = 123456789
c(rngstream) = 1 (set rngstream)
------------------------------------------------------
sort settings
-------------
------------------------------------------------------
c(sortmethod) = "default" (set sortmethod)
c(sort_current) = "fsort"
c(sortrngstate) = "1001XZA112210f4b1.." (set sortrngstate)
------------------------------------------------------
Unicode settings
----------------
------------------------------------------------------
c(locale_ui) = "" (set locale_ui)
c(locale_functions) = "en_US" (set locale_functions)
c(locale_icudflt) = "en_US" (unicode locale)
------------------------------------------------------
Other settings
--------------
------------------------------------------------------
c(type) = "float" (set type)
c(maxiter) = 300 (set maxiter)
c(searchdefault) = "all" (set searchdefault)
c(varabbrev) = "off" (set varabbrev)
c(emptycells) = "keep" (set emptycells)
c(fvtrack) = "term" (set fvtrack)
c(fvbase) = "on" (set fvbase)
c(odbcmgr) = "iodbc" (set odbcmgr)
c(odbcdriver) = "unicode" (set odbcdriver)
c(fredkey) = "" (set fredkey)
c(collect_double) = "on" (set collect_double)
c(dtascomplevel) = 1 (set dtascomplevel)
c(reshape_favor) = "memory" (set reshape_favor)
------------------------------------------------------