parallel icon indicating copy to clipboard operation
parallel copied to clipboard

pll_id conflicts when submitting many jobs simultaneously (LSF arrays)

Open rpguiteras opened this issue 4 months ago • 2 comments

Preliminaries

Before submitting an issue, please check (with x in brackets) that you:

  • [x] Are using the newest release (see here for latest release version number).
  • [x] Have checked that the examples in the help work.
  • [x] Have read the help (HTML version) and the gallery of examples.
  • [x] Have checked that there is not already an existing issues for what you are reporting.

Expected behavior and actual behavior

Context

I am using LSF as a job scheduler to submit an array of jobs to our cluster. Each job is assigned 4 cores, and I am using parallel sim to divide the simulation among the 4 cores.

Desired behavior

I would like parallel to assign a unique pll_id to each job.

Actual behavior

LSF may send many jobs simultaneously, i.e., within the same second. This means that parallel assigns the same pll_id to each job, causing a conflict and errors for all but one of the jobs that arrive within the same second.

Failed solution attempted

Because each job has a unique seed, I tried the randtype("current") option but this was not effective.

Solution (workaround)

What solved the problem was to create a while loop such that, if cap parallel sim returns an error, we wait a random number of seconds (1-16, although this is arbitrary) and try again. This was successful although somewhat to my surprise some jobs needed to go through the loop 10 or more times. I have appended a sketch of the code.

Steps to reproduce the problem

This would be pretty tough because I think it depends on the specifics of our cluster, our job scheduler, etc.

System information

Some relevant information

  • Stata version and flavor: Stata 18.5 MP4
  • OS type and version (e.g. Windows 10): RHEL9
  • Parallel version: version 1.20.1 07jun2021

Output from creturn list:

System values
-------------

    ------------------------------------------------------
        c(current_date) = "25 Oct 2024"
        c(current_time) = "09:09:16"
           c(rmsg_time) = 0                          (seconds, from set rmsg)
    ------------------------------------------------------
       c(stata_version) = 18.5
             c(version) = 17                         (version)
         c(userversion) = 17                         (version)
      c(dyndoc_version) = 2                          (dyndoc)
    ------------------------------------------------------
           c(born_date) = "16 Jul 2024"
             c(edition) = "BE"
        c(edition_real) = "MP"
                 c(bit) = 64
                  c(SE) = 1
                  c(MP) = 1
          c(processors) = 4                          (Stata/MP, set processors)
      c(processors_lic) = 4
     c(processors_mach) = 32
      c(processors_max) = 4
       c(kmp_blocktime) = 200                        (set kmp_blocktime)
                c(mode) = "batch"
             c(console) = "console"
    ------------------------------------------------------
                  c(os) = "Unix"
               c(osdtl) = ""
            c(hostname) = "c009n01"
        c(machine_type) = "PC (64-bit x86-64)"
           c(byteorder) = "lohi"
            c(username) = "rpguiter"
    ------------------------------------------------------

Directories and paths
---------------------

    ------------------------------------------------------
        c(sysdir_stata) = "/usr/local/apps/s.."      (sysdir)
         c(sysdir_base) = "/usr/local/apps/s.."      (sysdir)
         c(sysdir_site) = "/usr/local/apps/s.."      (sysdir)
         c(sysdir_plus) = "code/ado/plus/"           (sysdir)
     c(sysdir_personal) = "code/ado/personal/"       (sysdir)
     c(sysdir_oldplace) = "code/ado/"                (sysdir)
              c(tmpdir) = "/share/rpguiter/r.."
    ------------------------------------------------------
             c(adopath) = "BASE;SITE;.;PERSO.."      (adopath)
                 c(pwd) = "/rs1/researchers/.."      (cd)
              c(dirsep) = "/"
    ------------------------------------------------------

System limits
-------------

    ------------------------------------------------------
        c(max_N_theory) = 1099511627775
        c(max_k_theory) = 5000                       (set maxvar)
    c(max_width_theory) = 1048576                    (set maxvar)
    ------------------------------------------------------
          c(max_matdim) = 65534
    ------------------------------------------------------
        c(max_it_cvars) = 64
        c(max_it_fvars) = 8
    ------------------------------------------------------
        c(max_macrolen) = 15480200
            c(macrolen) = 645200                     (set maxvar)
             c(charlen) = 67783
          c(max_cmdlen) = 15480216
              c(cmdlen) = 645216                     (set maxvar)
         c(namelenbyte) = 128
         c(namelenchar) = 32
               c(eqlen) = 1337
    ------------------------------------------------------

Numerical and string limits
---------------------------

    ------------------------------------------------------
           c(mindouble) = -8.9884656743e+307
           c(maxdouble) = 8.9884656743e+307
           c(epsdouble) = 2.22044604925e-16
      c(smallestdouble) = 2.2250738585e-308
    ------------------------------------------------------
            c(minfloat) = -1.70141173319e+38
            c(maxfloat) = 1.70141173319e+38
            c(epsfloat) = 1.19209289551e-07
    ------------------------------------------------------
             c(minlong) = -2147483647
             c(maxlong) = 2147483620
    ------------------------------------------------------
              c(minint) = -32767
              c(maxint) = 32740
    ------------------------------------------------------
             c(minbyte) = -127
             c(maxbyte) = 100
    ------------------------------------------------------
        c(maxstrvarlen) = 2045
       c(maxstrlvarlen) = 2000000000
        c(maxvlabellen) = 32000
    ------------------------------------------------------

Current dataset
---------------

    ------------------------------------------------------
               c(frame) = "default"
                   c(N) = 0
                   c(k) = 0
               c(width) = 0
             c(changed) = 0
            c(filename) = ""
            c(filedate) = ""
    ------------------------------------------------------

Memory settings
---------------

    ------------------------------------------------------
              c(memory) = 33554432
              c(maxvar) = 5000                       (set maxvar)
            c(niceness) = 5                          (set niceness)
          c(min_memory) = 0                          (set min_memory)
          c(max_memory) = .                          (set max_memory)
         c(segmentsize) = 33554432                   (set segmentsize)
             c(adosize) = 1000                       (set adosize)
     c(max_preservemem) = 1073741824                 (set max_preservemem)
    ------------------------------------------------------

Output settings
---------------

    ------------------------------------------------------
                c(more) = "off"                      (set more)
                c(rmsg) = "off"                      (set rmsg)
                  c(dp) = "period"                   (set dp)
            c(linesize) = 110                        (set linesize)
            c(pagesize) = 23                         (set pagesize)
             c(logtype) = "smcl"                     (set logtype)
              c(logmsg) = "on"                       (set logmsg)
             c(noisily) = 1
    ------------------------------------------------------
             c(iterlog) = "on"                       (set iterlog)
    ------------------------------------------------------
               c(level) = 95                         (set level)
              c(clevel) = 95                         (set clevel)
    ------------------------------------------------------
      c(showbaselevels) = ""                         (set showbaselevels)
      c(showemptycells) = ""                         (set showemptycells)
         c(showomitted) = ""                         (set showomitted)
             c(fvlabel) = "on"                       (set fvlabel)
              c(fvwrap) = 1                          (set fvwrap)
            c(fvwrapon) = "word"                     (set fvwrapon)
            c(lstretch) = ""                         (set lstretch)
    ------------------------------------------------------
             c(cformat) = ""                         (set cformat)
             c(sformat) = ""                         (set sformat)
             c(pformat) = ""                         (set pformat)
    ------------------------------------------------------
      c(coeftabresults) = "on"                       (set coeftabresults)
                c(dots) = "on"                       (set dots)
    ------------------------------------------------------
       c(collect_label) = "default"                  (set collect_label)
       c(collect_style) = "default"                  (set collect_style)
         c(table_style) = "table"                    (set table_style)
        c(etable_style) = "etable"                   (set etable_style)
        c(dtable_style) = "dtable"                   (set dtable_style)
        c(collect_warn) = "on"                       (set collect_warn)
    ------------------------------------------------------

Interface settings
------------------

    ------------------------------------------------------
             c(linegap) = .                          (set linegap)
       c(scrollbufsize) = .                          (set scrollbufsize)
               c(maxdb) = 50                         (set maxdb)
    ------------------------------------------------------

Graphics settings
-----------------

    ------------------------------------------------------
            c(graphics) = "off"                      (set graphics)
              c(scheme) = "s1color"                  (set scheme)
          c(printcolor) = "asis"                     (set printcolor)
       c(min_graphsize) = 1                          (region_options)
       c(max_graphsize) = 100                        (region_options)
    ------------------------------------------------------

Network settings
----------------

    ------------------------------------------------------
           c(httpproxy) = "off"                      (set httpproxy)
       c(httpproxyhost) = ""                         (set httpproxyhost)
       c(httpproxyport) = 80                         (set httpproxyport)
    ------------------------------------------------------
       c(httpproxyauth) = "off"                      (set httpproxyauth)
       c(httpproxyuser) = ""                         (set httpproxyuser)
         c(httpproxypw) = ""                         (set httpproxypw)
    ------------------------------------------------------

Trace (program debugging) settings
----------------------------------

    ------------------------------------------------------
               c(trace) = "off"                      (set trace)
          c(tracedepth) = 1                          (set tracedepth)
            c(tracesep) = "on"                       (set tracesep)
         c(traceindent) = "on"                       (set traceindent)
         c(traceexpand) = "on"                       (set traceexpand)
         c(tracenumber) = "off"                      (set tracenumber)
         c(tracehilite) = ""                         (set tracehilite)
    ------------------------------------------------------

Mata settings
-------------

    ------------------------------------------------------
          c(matastrict) = "off"                      (set matastrict)
            c(matalnum) = "off"                      (set matalnum)
        c(mataoptimize) = "on"                       (set mataoptimize)
           c(matafavor) = "space"                    (set matafavor)
           c(matacache) = 2000                       (set matacache)
            c(matalibs) = ""                         (set matalibs)
         c(matamofirst) = "off"                      (set matamofirst)
        c(matasolvetol) = .                          (set matasolvetol)
    ------------------------------------------------------

Java settings
-------------

    ------------------------------------------------------
        c(java_heapmax) = "4096m"                    (set java_heapmax)
           c(java_home) = "/usr/local/apps/s.."      (set java_home)
    ------------------------------------------------------

LAPACK settings
---------------

    ------------------------------------------------------
          c(lapack_mkl) = "on"                       (set lapack_mkl)
      c(lapack_mkl_cnr) = "default"                  (set lapack_mkl_cnr)
    ------------------------------------------------------

putdocx settings
----------------

    ------------------------------------------------------
      c(docx_hardbreak) = "off"                      (set docx_hardbreak)
       c(docx_paramode) = "off"                      (set docx_paramode)
       c(docx_maxtable) = 500                        (set docx_maxtable)
    ------------------------------------------------------

putpdf settings
---------------

    ------------------------------------------------------
        c(pdf_maxtable) = 500                        (set pdf_maxtable)
    ------------------------------------------------------

Python settings
---------------

    ------------------------------------------------------
         c(python_exec) = ""                         (set python_exec)
     c(python_userpath) = ""                         (set python_userpath)
    ------------------------------------------------------

RNG settings
------------

    ------------------------------------------------------
                 c(rng) = "default"                  (set rng)
         c(rng_current) = "mt64"
            c(rngstate) = "XAA00000000000000.."      (set rngstate)
       c(rngseed_mt64s) = 123456789
           c(rngstream) = 1                          (set rngstream)
    ------------------------------------------------------

sort settings
-------------

    ------------------------------------------------------
          c(sortmethod) = "default"                  (set sortmethod)
        c(sort_current) = "fsort"
        c(sortrngstate) = "1001XZA112210f4b1.."      (set sortrngstate)
    ------------------------------------------------------

Unicode settings
----------------

    ------------------------------------------------------
           c(locale_ui) = ""                         (set locale_ui)
    c(locale_functions) = "en_US"                    (set locale_functions)
      c(locale_icudflt) = "en_US"                    (unicode locale)
    ------------------------------------------------------

Other settings
--------------

    ------------------------------------------------------
                c(type) = "float"                    (set type)
             c(maxiter) = 300                        (set maxiter)
       c(searchdefault) = "all"                      (set searchdefault)
           c(varabbrev) = "off"                      (set varabbrev)
          c(emptycells) = "keep"                     (set emptycells)
             c(fvtrack) = "term"                     (set fvtrack)
              c(fvbase) = "on"                       (set fvbase)
             c(odbcmgr) = "iodbc"                    (set odbcmgr)
          c(odbcdriver) = "unicode"                  (set odbcdriver)
             c(fredkey) = ""                         (set fredkey)
      c(collect_double) = "on"                       (set collect_double)
       c(dtascomplevel) = 1                          (set dtascomplevel)
       c(reshape_favor) = "memory"                   (set reshape_favor)
    ------------------------------------------------------

rpguiteras avatar Oct 25 '24 16:10 rpguiteras