plotman icon indicating copy to clipboard operation
plotman copied to clipboard

Plotman just says ?:? for phases.

Open Emperornero opened this issue 3 years ago • 28 comments

Heyo.

I've got an issue here where plotman isn't capable of detecting phases, and runs nonstop without staggering or doing proper max amounts on temp directories.

Willing to provide any logs needed....

image

Emperornero avatar May 12 '21 15:05 Emperornero

I've seen this after restarting plotman having changed to location of the log files. I think the running plotter are picked up but plotman can't locate their logs.

aseyhowell avatar May 12 '21 16:05 aseyhowell

I am seeing an error regarding log file locations, but the logs have not been moved from their original generated location.

Perhaps a permissions issue?

image

Emperornero avatar May 12 '21 18:05 Emperornero

I'd guess that chia has access to the log files but plotman doesn't. If I was you my first action would be to look at the permissions for the log directory.

aseyhowell avatar May 12 '21 21:05 aseyhowell

I'm seeing this same issue as @Emperornero (thanks for opening). I'm on macOS 11.3.1, and today's my first day using plotman. The problem persists on both Intel and M1 Macs. Clean installing both chia and plotman doesn't seem to make a difference.

  • The python user in activity monitor is the same as the owner of the log directory
  • The log directory path contains no spaces, which has seemed to cause permissions issues in the past
  • The logs directory is a sibling of chia-blockchain

As @Emperornero mentioned, plotman doesn't seem to see the logfile, even though it's in the list of the process' open files:

...sleeping 20 s: (False, 'stagger (80s/3600s)')
Found plotting process PID 25995, but could not find logfile in its open files:
/Users/mac/chia/logs/2021-05-12T17_32_09.116413-07_00.log
/Users/mac/chia/logs/2021-05-12T17_32_09.116413-07_00.log
/Volumes/temp/plot-k32-2021-05-12-17-32-611588ce9b7004f185b644a38a1a38e77576a42e8cdfb9ac6968811570c1c60c.plot.sort.tmp
/Volumes/temp/plot-k32-2021-05-12-17-32-611588ce9b7004f185b644a38a1a38e77576a42e8cdfb9ac6968811570c1c60c.plot.table1.tmp

...

/Volumes/temp/plot-k32-2021-05-12-17-32-611588ce9b7004f185b644a38a1a38e77576a42e8cdfb9ac6968811570c1c60c.plot.table7.tmp
/Volumes/temp/plot-k32-2021-05-12-17-32-611588ce9b7004f185b644a38a1a38e77576a42e8cdfb9ac6968811570c1c60c.plot.2.tmp
/Volumes/temp/plot-k32-2021-05-12-17-32-611588ce9b7004f185b644a38a1a38e77576a42e8cdfb9ac6968811570c1c60c.plot.p1.t1.sort_bucket_007.tmp

...

/Volumes/temp/plot-k32-2021-05-12-17-32-611588ce9b7004f185b644a38a1a38e77576a42e8cdfb9ac6968811570c1c60c.plot.p1.t2.sort_bucket_127.tmp

Not sure if it's strange for the process to hold two references to the same log entry.

plotman status produces the now-familiar ?:? output:

plot    k    tmp    dst   wall   phas   tmp    pid   stat   mem   use   sys   io
  id                                e                               r           
----   32   /Vol   /Vol    12s    ?:?     0   2602    RUN   418    0s    0s    -
----        umes   umes                          0          .2G                 
             /temp   /dest                                                                                                  

ls -l in my /Users/mac/chia directory gives:

total 0
drwxr-xr-x  31 mac  staff  992 May 12 17:30 chia-blockchain
drwxr-xr-x   3 mac  staff   96 May 12 17:49 logs

tyronep avatar May 13 '21 00:05 tyronep

@tyronep, thanks for all the details. Any chance you could add your plotman.yaml as well for comparison?

altendky avatar May 13 '21 00:05 altendky

@altendky, absolutely:

# Default/example plotman.yaml configuration file

# Options for display and rendering
user_interface:
        # Call out to the `stty` program to determine terminal size, instead of
        # relying on what is reported by the curses library.   In some cases,
        # the curses library fails to update on SIGWINCH signals.  If the
        # `plotman interactive` curses interface does not properly adjust when
        # you resize the terminal window, you can try setting this to True. 
        use_stty_size: True

# Where to plot and log.
directories:
        # One directory in which to store all plot job logs (the STDOUT/
        # STDERR of all plot jobs).  In order to monitor progress, plotman
        # reads these logs on a regular basis, so using a fast drive is
        # recommended.
        log: /users/mac/chia/logs

        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp:
                - /Volumes/temp

        # Optional: Allows overriding some characteristics of certain tmp
        # directories. This contains a map of tmp directory names to
        # attributes. If a tmp directory and attribute is not listed here,
        # it uses the default attribute setting from the main configuration.
        #
        # Currently support override parameters:
        #     - tmpdir_max_jobs
        # tmp_overrides:
        #         # In this example, /mnt/tmp/00 is larger than the other tmp
        #         # dirs and it can hold more plots than the default.
        #         "/mnt/tmp/00":
        #                 tmpdir_max_jobs: 5

        # Optional: tmp2 directory.  If specified, will be passed to
        # chia plots create as -2.  Only one tmp2 directory is supported.
        # tmp2: /mnt/tmp/a

        # One or more directories; the scheduler will use all of them.
        # These again are presumed to be on independent physical devices,
        # so writes (plot jobs) and reads (archivals) can be scheduled
        # to minimize IO contention.
        dst:
                - /Volumes/dest

        # Archival configuration.  Optional; if you do not wish to run the
        # archiving operation, comment this section out.
        #
        # Currently archival depends on an rsync daemon running on the remote
        # host.
        # The archival also uses ssh to connect to the remote host and check
        # for available directories. Set up ssh keys on the remote host to
        # allow public key login from rsyncd_user.
        # Complete example: https://github.com/ericaltendorf/plotman/wiki/Archiving
        # archive:
        #         rsyncd_module: plots # Define this in remote rsyncd.conf.
        #         rsyncd_path: /plots # This is used via ssh. Should match path
        #                             # defined in the module referenced above.
        #         rsyncd_bwlimit: 102400  # Bandwidth limit in KB/s
        #         rsyncd_host: <snip>
        #         rsyncd_user: <snip>
                # Optional index.  If omitted or set to 0, plotman will archive
                # to the first archive dir with free space.  If specified,
                # plotman will skip forward up to 'index' drives (if they exist).
                # This can be useful to reduce io contention on a drive on the
                # archive host if you have multiple plotters (simultaneous io
                # can still happen at the time a drive fills up.)  E.g., if you
                # have four plotters, you could set this to 0, 1, 2, and 3, on
                # the 4 machines, or 0, 1, 0, 1.
                #   index: 0


# Plotting scheduling parameters
scheduling:
        # Run a job on a particular temp dir only if the number of existing jobs
        # before [tmpdir_stagger_phase_major : tmpdir_stagger_phase_minor]
        # is less than tmpdir_stagger_phase_limit.
        # Phase major corresponds to the plot phase, phase minor corresponds to
        # the table or table pair in sequence, phase limit corresponds to
        # the number of plots allowed before [phase major : phase minor].
        # e.g, with default settings, a new plot will start only when your plot
        # reaches phase [2 : 1] on your temp drive. This setting takes precidence
        # over global_stagger_m
        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 1

        # Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 3

        # Don't run more than this many jobs at a time in total.
        global_max_jobs: 3

        # Don't run any jobs (across all temp dirs) more often than this, in minutes.
        global_stagger_m: 60

        # How often the daemon wakes to consider starting a new plot job, in seconds.
        polling_time_s: 20


# Plotting parameters.  These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
        k: 32
        e: False             # Use -e plotting option
        n_threads: 4         # Threads per job
        n_buckets: 128       # Number of buckets to split data into
        job_buffer: 2304     # Per job memory
        # If specified, pass through to the -f and -p options.  See CLI reference.
        farmer_pk: <snip>
        pool_pk: <snip>

Other details:

  • The log appears in the log directory, and can be tailed to follow job progress
  • Temp files appear in the temp volume
  • Install: git clone https://github.com/Chia-Network/chia-blockchain.git -b latest (-> install.sh -> source activate -> init, etc.)
  • Version: chia version = 1.1.5 (commit 0b91005c6b448ab6b5aaf1125bdfc5e38fd8f511)
  • Install: pip install --force-reinstall git+https://github.com/ericaltendorf/plotman@development (fix for click dep, but same issue on main earlier today)
  • Version: plotman version = plotman 0.3+dev

tyronep avatar May 13 '21 01:05 tyronep

@tyronep, could you try @release/v0.3.1 instead of @development while I think about this?

altendky avatar May 13 '21 01:05 altendky

Err, when in the world did my chrome search become... case insensitive? Try a capital U for /Users.

altendky avatar May 13 '21 01:05 altendky

Thanks for the suggestions, @altendky. Capital U /Users is working on both @development and @release/v0.3.1!

Very strange, as I was originally using /Users/mac/chia/logs, and switched it out of desperation. Maybe I made another change at some point, or hallucinated the whole thing.

Either way, thank you very much for the help!

tyronep avatar May 13 '21 01:05 tyronep

pwd on both Intel and ARM indeed returns /Users.... Not sure what possessed me to make this change. Maybe this is a fix for you too, @Emperornero?

Edit: just checked another machine, and found the Chia directory was capitalized there. Matched this to the config, and it's also running fine. Case sensitivity! Who knew?

tyronep avatar May 13 '21 01:05 tyronep

Good deal, easy fix. :] We'll see for the OP.

altendky avatar May 13 '21 01:05 altendky

My U was big but oh my god I had an UPPER CASE J. All fixed here too! Thanks!

jameswood avatar May 13 '21 08:05 jameswood

EDIT:

Plotman's issue was that it is unable to do SYSCALLS from WSL 1. Updating to WSL 2 fixes this issue, but it seems an issue with WSL 2 impacts performance greatly.

Not an issue with plotman.

Glad we killed 2 birds with one stone, haha!

Emperornero avatar May 13 '21 09:05 Emperornero

So after further investigation I think this may help solve WSL issues for plotman.

image

ls -l for the PID that plotman is unable to read the log file shows that the log file is open by the process, but for whatever reason plotman is not getting it from the PID open files list.

Perhaps a way could be added that instead of reading logs from the PID, they could be read from the log directory instead?

Emperornero avatar May 13 '21 10:05 Emperornero

We need to associate to the process. For wsl setups consider https://github.com/ericaltendorf/plotman/issues/108#issuecomment-822151350. Maybe you could add a wsl entry to the wiki? Also, we are working towards native Windows support.

altendky avatar May 13 '21 13:05 altendky

Also, what about that saws it is open by the process? I see that it is a symlink with goofy write/execute permissions which would sensibly mark it as not readable.

altendky avatar May 13 '21 13:05 altendky

Also, what about that saws it is open by the process? I see that it is a symlink with goofy write/execute permissions which would sensibly mark it as not readable.

Definitely not a symlink, just running on WSL 1. Migrating the distro to WSL 2 fixes the issue, no other changes required. However, a new problem arrives.

WSL 2 is almost NINE times slower at I/0 operations than WSL 1. An issue that Microsoft is supposedly working on. Microsoft actually recommends WSL 1 for files that are cross OS. Since you will be writing to mounted drives in WSL, that makes WSL 1 far superior to WSL 2.

https://github.com/microsoft/WSL/issues/4197 https://docs.microsoft.com/en-us/windows/wsl/compare-versions

WSL 1 is 25% faster than Native Windows plotting too facepalm. WSL 2 is 50% slower than native Windows, and 100% slower than WSL 1.

I would actually recommend people plot on WSL 1 over Windows. It's that much faster. Plots that were taking 7 hours to parallel plot are done in 4 hours on WSL 1 on my 3970x, and I'm doing 24 parallel plots. Hitting almost 120 plots per day.

What a mess.

Emperornero avatar May 13 '21 13:05 Emperornero

What does the l as the first entry in the permissions list mean if not symlink?

Did you look at the link I provided re: wsl?

altendky avatar May 13 '21 13:05 altendky

What does the l as the first entry in the permissions list mean if not symlink?

Did you look at the link I provided re: wsl?

Your provided link is referencing WSL 2.

The l is just generated when running 'ls -l /proc/39/fd'

Here's a picture of the output. As you can see, it does see the log as an open file, plotman is just unable to read it for some reason.

EDIT: I think I see what you're saying. I'm not proficient in Linux, so is the RWX Read/Write/Modify? Read is missing from the permissions for the log?

image

Emperornero avatar May 13 '21 13:05 Emperornero

Yes, I referenced WSL2 and a thing to deal with performance issues in it. The l means symlink just as the -> indicates that as well. But now that you include the actual command I see that you are not listing the log directory.

Try WSL2 as suggested and let's go from there. Share the full evidence if we are going to diagnose this more. interactive output, ps aux | grep 'plots create', ls -l of the actual logs directory, ls -l from /proc as you did, and the full configuration. All as text please.

altendky avatar May 13 '21 14:05 altendky

Yes, I referenced WSL2 and a thing to deal with performance issues in it. The l means symlink just as the -> indicates that as well. But now that you include the actual command I see that you are not listing the log directory.

Try WSL2 as suggested and let's go from there. Share the full evidence if we are going to diagnose this more. interactive output, ps aux | grep 'plots create', ls -l of the actual logs directory, ls -l from /proc as you did, and the full configuration. All as text please.

So accept the performance hit just to make plotman work?

Emperornero avatar May 13 '21 14:05 Emperornero

I'll have to try the WSL thing later after these plots are done. I don't see it making a difference, as my drives were already mounted, but I'll try it if there's something different. I'd rather plotman not work than have to wait 2 times as long for plot creation.

EDIT: The WSL 2 fix requires running on Dev channel Windows. I don't see myself doing that. Unfortunately looks like I'll have to struggle as is.

ps aux | grep 'plots create'

~/chia-blockchain$ ps aux | grep 'plots create'
rainmak+    39  140  2.9 5333012 1986976 ?     RNs  03:22 348:45 /home/rainmaker/chia-blockchain/venv/bin/python /home/rainmaker/chia-blockchain/venv/bin/chia plots create -k 32 -r 8 -u 128 -b 4600 -t /mnt/j/ -d /mnt/g/plottemp
rainmak+   372  146  2.9 5331988 1982724 ?     RNs  03:47 329:06 /home/rainmaker/chia-blockchain/venv/bin/python /home/rainmaker/chia-blockchain/venv/bin/chia plots create -k 32 -r 8 -u 128 -b 4600 -t /mnt/j/ -d /mnt/g/plottemp
rainmak+   828  149  2.5 5352724 1740844 ?     RNs  03:52 328:05 /home/rainmaker/chia-blockchain/venv/bin/python /home/rainmaker/chia-blockchain/venv/bin/chia plots create -k 32 -r 8 -u 128 -b 4600 -t /mnt/j/ -d /mnt/g/plottemp
rainmak+  1308  151  2.6 5352720 1744568 ?     RNs  03:57 323:32 /home/rainmaker/chia-blockchain/venv/bin/python /home/rainmaker/chia-blockchain/venv/bin/chia plots create -k 32 -r 8 -u 128 -b 4600 -t /mnt/j/ -d /mnt/g/plottemp
rainmak+  1781  152  2.9 5330956 1979256 ?     RNs  04:02 319:30 /home/rainmaker/chia-blockchain/venv/bin/python /home/rainmaker/chia-blockchain/venv/bin/chia plots create -k 32 -r 8 -u 128 -b 4600 -t /mnt/j/ -d /mnt/g/plottemp
rainmak+  2242  153  2.9 5330956 1985196 ?     RNs  04:07 313:37 /home/rainmaker/chia-blockchain/venv/bin/python /home/rainmaker/chia-blockchain/venv/bin/chia plots create -k 32 -r 8 -u 128 -b 4600 -t /mnt/j/ -d /mnt/g/plottemp
rainmak+  3334  0.0  0.0  16212  1272 tty1     S    07:31   0:00 grep --color=auto plots create

plotman interactive

Plotman 07:37:52 (refresh 10s/20)  |  <P>lotting: (active) stagger (7s/300s) <A>rchival: (not configured)
Jobs (7): [1        2        3       4 ]
Prefixes:  tmp=/mnt  dst=/mnt/g/plottemp (remote)
  #    plot id    k   tmp   dst   wall   phase   tmp    pid   stat      mem   user    sys   io                            0   --------   32     j     .    17s     ?:?     0   3337    SLP   802.2M    11s     7s   0s
  1   --------   32     j     .   3:30     ?:?     0   2242    RUN     5.5G   5:03   0:15   0s
  2   --------   32     j     .   3:35     ?:?     0   1781    RUN     5.5G   5:08   0:16   0s
  3   --------   32     j     .   3:40     ?:?     0   1308    RUN     5.5G   5:11   0:17   0s
  4   --------   32     j     .   3:45     ?:?     0    828    RUN     5.5G   5:16   0:17   0s
  5   --------   32     j     .   3:50     ?:?     0    372    RUN     5.5G   5:17   0:17   0s
  6   --------   32     j     .   4:15     ?:?     0     39    RUN     5.5G   5:34   0:19   0s

Total jobs: 7
Jobs in j: 7


tmp   ready            phases
  h      OK                                  dst   plots   GBfree         inbnd phases         pri
  i      OK                                  .     0       465      ?:? ?:? [+2] ?:? ?:? ?:?   57
  j      OK   ?:? ?:? [+2] ?:? ?:? ?:?
Archive dirs free space
<archiving not configured>
Log: 0 (<up>/<down>/<end> to scroll)

ls -l /home/rainmaker/plotmanlogs

~/chia-blockchain$ ls -l /home/rainmaker/plotmanlogs
total 964
-rw-r--r-- 1 rainmaker rainmaker 151670 May 13 07:40 2021-05-13T03_22_33.048764-07_00.log
-rw-r--r-- 1 rainmaker rainmaker 124419 May 13 07:40 2021-05-13T03_47_28.075343-07_00.log
-rw-r--r-- 1 rainmaker rainmaker 114652 May 13 07:40 2021-05-13T03_52_31.336646-07_00.log
-rw-r--r-- 1 rainmaker rainmaker 111292 May 13 07:40 2021-05-13T03_57_35.279322-07_00.log
-rw-r--r-- 1 rainmaker rainmaker 105325 May 13 07:40 2021-05-13T04_02_40.035597-07_00.log
-rw-r--r-- 1 rainmaker rainmaker 102125 May 13 07:40 2021-05-13T04_07_45.836950-07_00.log
-rw-r--r-- 1 rainmaker rainmaker   1121 May 13 07:37 2021-05-13T07_37_35.876434-07_00.log

Full config

# Default/example plotman.yaml configuration file

# Options for display and rendering
user_interface:
        # Call out to the `stty` program to determine terminal size, instead of
        # relying on what is reported by the curses library.   In some cases,
        # the curses library fails to update on SIGWINCH signals.  If the
        # `plotman interactive` curses interface does not properly adjust when
        # you resize the terminal window, you can try setting this to True. 
        use_stty_size: True

# Where to plot and log.
directories:
        # One directory in which to store all plot job logs (the STDOUT/
        # STDERR of all plot jobs).  In order to monitor progress, plotman
        # reads these logs on a regular basis, so using a fast drive is
        # recommended.
        log: /home/rainmaker/plotmanlogs/

        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp:
                - /mnt/j/
                - /mnt/h/
                - /mnt/i/
                - /mnt/k/

        # Optional: Allows overriding some characteristics of certain tmp
        # directories. This contains a map of tmp directory names to
        # attributes. If a tmp directory and attribute is not listed here,
        # it uses the default attribute setting from the main configuration.
        #
        # Currently support override parameters:
        #     - tmpdir_max_jobs
        #tmp_overrides:
                # In this example, /mnt/tmp/00 is larger than the other tmp
                # dirs and it can hold more plots than the default.
        #       "/mnt/tmp/00":
        #              tmpdir_max_jobs: 5

        # Optional: tmp2 directory.  If specified, will be passed to
        # chia plots create as -2.  Only one tmp2 directory is supported.
        # tmp2: /mnt/tmp/a

        # One or more directories; the scheduler will use all of them.
        # These again are presumed to be on independent physical devices,
        # so writes (plot jobs) and reads (archivals) can be scheduled
        # to minimize IO contention.
        dst:
                - /mnt/g/plottemp

        # Archival configuration.  Optional; if you do not wish to run the
        # archiving operation, comment this section out.
        #
        # Currently archival depends on an rsync daemon running on the remote
        # host.
        # The archival also uses ssh to connect to the remote host and check
        # for available directories. Set up ssh keys on the remote host to
        # allow public key login from rsyncd_user.
        # Complete example: https://github.com/ericaltendorf/plotman/wiki/Archiving
        #archive:
                # rsyncd_module: plots # Define this in remote rsyncd.conf.
                # rsyncd_path: /plots # This is used via ssh. Should match path
                # dfined in the module referenced above.    
                # rsyncd_bwlimit: 80000  # Bandwidth limit in KB/s
                # rsyncd_host: myfarmer
                # rsyncd_user: chia
                # Optional index.  If omitted or set to 0, plotman will archive
                # to the first archive dir with free space.  If specified,
                # plotman will skip forward up to 'index' drives (if they exist).
                # This can be useful to reduce io contention on a drive on the
                # archive host if you have multiple plotters (simultaneous io
                # can still happen at the time a drive fills up.)  E.g., if you
                # have four plotters, you could set this to 0, 1, 2, and 3, on
                # the 4 machines, or 0, 1, 0, 1.
                #   index: 0


# Plotting scheduling parameters
scheduling:
        # Run a job on a particular temp dir only if the number of existing jobs
        # before [tmpdir_stagger_phase_major : tmpdir_stagger_phase_minor]
        # is less than tmpdir_stagger_phase_limit.
        # Phase major corresponds to the plot phase, phase minor corresponds to
        # the table or table pair in sequence, phase limit corresponds to
        # the number of plots allowed before [phase major : phase minor].
        # e.g, with default settings, a new plot will start only when your plot
        # reaches phase [2 : 1] on your temp drive. This setting takes precidence
        # over global_stagger_m
        tmpdir_stagger_phase_major: 1
        tmpdir_stagger_phase_minor: 2
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 1

        # Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 6

        # Don't run more than this many jobs at a time in total.
        global_max_jobs: 24

        # Don't run any jobs (across all temp dirs) more often than this, in minutes.
        global_stagger_m: 5

        # How often the daemon wakes to consider starting a new plot job, in seconds.
        polling_time_s: 20


# Plotting parameters.  These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
        k: 32
        e: False             # Use -e plotting option
        n_threads: 8         # Threads per job
        n_buckets: 128       # Number of buckets to split data into
        job_buffer: 4600     # Per job memory
        # If specified, pass through to the -f and -p options.  See CLI reference.
        #   farmer_pk: ...
        #   pool_pk: ...

Emperornero avatar May 13 '21 14:05 Emperornero

@Emperornero It might be nothing... but in your plotman.yaml file have you tried log: /home/rainmaker/plotmanlogs without the trailing slash?

jameswood avatar May 14 '21 00:05 jameswood

I've tried both structures.

On Thu, May 13, 2021, 5:26 PM James @.***> wrote:

@Emperornero https://github.com/Emperornero It might be nothing... but in your plotman.yaml file have you tried log: /home/rainmaker/plotmanlogs without the trailing slash?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ericaltendorf/plotman/issues/363#issuecomment-840914314, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSSRVFNHNXSEA6S7OEFGNDTNRU4BANCNFSM44Y3X5QQ .

Emperornero avatar May 14 '21 00:05 Emperornero

With the following code changes it works for my wsl1:

Add to job.py:

def list_fds(procId):
    '''List process currently open FDs and their target '''
    '''Source: https://stackoverflow.com/a/24803353 '''
    if not sys.platform.startswith('linux'):
        raise NotImplementedError('Unsupported platform: %s' % sys.platform)

    ret = []
    base = '/proc/' + str(procId) + '/fd'
    for num in os.listdir(base):
        path = None
        try:
            path = os.readlink(os.path.join(base, num))
        except OSError as err:
            # Last FD is always the "listdir" one (which may be closed)
            if err.errno != errno.ENOENT:
                raise
        ret.append(path)

    return ret

job.py init

Change

        for f in self.proc.open_files():
            if logroot in f.path:
                if self.logfile:
                    assert self.logfile == f.path
                else:
                    self.logfile = f.path
                break

to

        for f in list_fds(self.proc.pid):
            if logroot in f:
                if self.logfile:
                    assert self.logfile == f
                else:
                    self.logfile = f
                break

aheller693 avatar May 28 '21 20:05 aheller693

With the following code changes it works for my wsl1:

Add to job.py:

def list_fds(procId):
    '''List process currently open FDs and their target '''
    '''Source: https://stackoverflow.com/a/24803353 '''
    if not sys.platform.startswith('linux'):
        raise NotImplementedError('Unsupported platform: %s' % sys.platform)

    ret = []
    base = '/proc/' + str(procId) + '/fd'
    for num in os.listdir(base):
        path = None
        try:
            path = os.readlink(os.path.join(base, num))
        except OSError as err:
            # Last FD is always the "listdir" one (which may be closed)
            if err.errno != errno.ENOENT:
                raise
        ret.append(path)

    return ret

job.py init

Change

        for f in self.proc.open_files():
            if logroot in f.path:
                if self.logfile:
                    assert self.logfile == f.path
                else:
                    self.logfile = f.path
                break

to

        for f in list_fds(self.proc.pid):
            if logroot in f:
                if self.logfile:
                    assert self.logfile == f
                else:
                    self.logfile = f
                break

Genius.

This works. Now, to get plotman to stop blinking....

Emperornero avatar May 29 '21 00:05 Emperornero

With the following code changes it works for my wsl1:

Add to job.py:

def list_fds(procId):
    '''List process currently open FDs and their target '''
    '''Source: https://stackoverflow.com/a/24803353 '''
    if not sys.platform.startswith('linux'):
        raise NotImplementedError('Unsupported platform: %s' % sys.platform)

    ret = []
    base = '/proc/' + str(procId) + '/fd'
    for num in os.listdir(base):
        path = None
        try:
            path = os.readlink(os.path.join(base, num))
        except OSError as err:
            # Last FD is always the "listdir" one (which may be closed)
            if err.errno != errno.ENOENT:
                raise
        ret.append(path)

    return ret

job.py init

Change

        for f in self.proc.open_files():
            if logroot in f.path:
                if self.logfile:
                    assert self.logfile == f.path
                else:
                    self.logfile = f.path
                break

to

        for f in list_fds(self.proc.pid):
            if logroot in f:
                if self.logfile:
                    assert self.logfile == f
                else:
                    self.logfile = f
                break

I've noticed this sometimes crashes referencing the line

if err.errno != errno.ENOENT: NameError: name 'errno' is not defined

Any idea how to fix? It's so close to being perfect....

EDIT: Add import errno to the top of jobs.py and manager.py to fix.

Emperornero avatar Jun 02 '21 21:06 Emperornero

With the following code changes it works for my wsl1: Add to job.py:

def list_fds(procId):
    '''List process currently open FDs and their target '''
    '''Source: https://stackoverflow.com/a/24803353 '''
    if not sys.platform.startswith('linux'):
        raise NotImplementedError('Unsupported platform: %s' % sys.platform)

    ret = []
    base = '/proc/' + str(procId) + '/fd'
    for num in os.listdir(base):
        path = None
        try:
            path = os.readlink(os.path.join(base, num))
        except OSError as err:
            # Last FD is always the "listdir" one (which may be closed)
            if err.errno != errno.ENOENT:
                raise
        ret.append(path)

    return ret

job.py init Change

        for f in self.proc.open_files():
            if logroot in f.path:
                if self.logfile:
                    assert self.logfile == f.path
                else:
                    self.logfile = f.path
                break

to

        for f in list_fds(self.proc.pid):
            if logroot in f:
                if self.logfile:
                    assert self.logfile == f
                else:
                    self.logfile = f
                break

I've noticed this sometimes crashes referencing the line

if err.errno != errno.ENOENT: NameError: name 'errno' is not defined

Any idea how to fix? It's so close to being perfect....

EDIT: Add import errno to the top of jobs.py and manager.py to fix.

Now it's crashing when plots complete with a OSError: Errno 9 Bad file descriptor:

I've no clue how to fix. I know it has something to do with the finished job not being closed by the plotman process before it's closed by the chia plotting process.

Emperornero avatar Jun 03 '21 23:06 Emperornero