ophyd icon indicating copy to clipboard operation
ophyd copied to clipboard

Add way to escape 'Another set in progress'.

Open danielballan opened this issue 4 years ago • 32 comments

See #757. This is one possible approach.

TO DO

  • [x] Needs broad input about whether this is the right way to address #757
  • [ ] Needs tests
  • [x] As written this leaves a polling thread dangling. To fix that properly, we'd need set_and_wait to accept a poison pill threading.Event or some other means of signaling that it should stop waiting.

danielballan avatar Sep 01 '20 17:09 danielballan

So far, this PR makes a clean way to dump the thread in the same way we do now. We need to terminate those dangling threads. As is, console session does not exit cleanly, requiring one or more ^C to fully exit the process.

prjemian avatar Sep 01 '20 18:09 prjemian

Good point, @prjemian. I took it further in this next commit. I tested against this pathological caproto IOC which takes forever to set:

#!/usr/bin/env python3
from caproto.server import pvproperty, PVGroup, ioc_arg_parser, run


class Stubborn(PVGroup):
    """
    When a PV is written to, write the new value into a file as a string.
    """

    async def my_write(self, instance, value):
        print(f"Ignored request to write {value}")
        return 0

    A = pvproperty(put=my_write, value=0)
    B = pvproperty(put=my_write, value=0)


if __name__ == '__main__':
    ioc_options, run_options = ioc_arg_parser(
        default_prefix='stubborn:',
        desc='Run an IOC with PVs that ignores write requests')
    ioc = Stubborn(**ioc_options)
    run(ioc.pvdb, **run_options)

Before, console does not exit cleanly:

In [1]: from ophyd import EpicsSignal                                                                                                                                                         

In [2]: s = EpicsSignal("stubborn:A", name='s')                                                                                                                                               

In [3]: s.set(1)                                                                                                                                                                              
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', timestamp=1598986036.873889, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)

In [4]: _.done                                                                                                                                                                                
Out[4]: False

In [5]: s.set(2)                                                                                                                                                                              
<snipped>
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [6]: s.clear_set()                                                                                                                                                                         
/home/dallan/Repos/bnl/ophyd/ophyd/signal.py:318: UserWarning: A previous set() operaiton is being ignored. Only this do this when debugging or recovering from a hardware failure.
  "A previous set() operaiton is being ignored. Only this do this "

In [7]: exit                                                                                                                                                                                  
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 269, in set_thread
    rtol=self.rtolerance)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 263, in set_and_wait
    current_value = signal.get()
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1108, in get
    self._read_pv, timeout, connection_timeout, as_string, form)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1066, in _get_with_timeout
    info = pv.get_with_metadata(as_string=as_string, form=form, timeout=timeout)
  File "/home/dallan/.conda/envs/py37/lib/python3.7/site-packages/epics/pv.py", line 46, in wrapped
    raise RuntimeError('Expected CA context is unset')
RuntimeError: Expected CA context is unset

After, clear_set() causes an AbandonedSet exception to be raised in the in the background thread and waits for the thread to join() before returning, so now the console exits cleanly.

In [1]: from ophyd import EpicsSignal                                                                                                                                                         

In [2]: s = EpicsSignal("stubborn:A", name='s')                                                                                                                                               

In [3]: s.set(1)                                                                                                                                                                              
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', value=1, timestamp=1598986283.0511467, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)

In [4]: _.done                                                                                                                                                                                
Out[4]: False

In [5]: s.set(2)                                                                                                                                                                              
<snipped>
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [6]: s.clear_set()                                                                                                                                                                         
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 272, in set_thread
    rtol=self.rtolerance, poison_pill=poison_pill)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet
ophyd.utils.epics_pvs.AbandonedSet
/home/dallan/Repos/bnl/ophyd/ophyd/signal.py:328: UserWarning: A previous set() operaiton is being ignored. Only this do this when debugging or recovering from a hardware failure.
  "A previous set() operaiton is being ignored. Only this do this "

In [7]: exit  # no errors at exit

danielballan avatar Sep 01 '20 18:09 danielballan

I think this kind of event wait poison pill approach is great for killing these background set threads. We've run into this a few times at SLAC, but in our case this indicated a problem with either the IOC or the device class- can't quite remember the details.

Do you think signals should automatically call this at python exit? It might smooth out the shutdown of some of the ipython sessions.

ZLLentz avatar Sep 02 '20 18:09 ZLLentz

We've run into this a few times at SLAC, but in our case this indicated a problem with either the IOC or the device class-

Yes, I think that will be the case in general. My only reservation about adding this is that it might be used routinely, rather than as intended role: allowing you to salvage the ophyd object instance but prompting you to look into the device or IOC issue. But, the current situation, where if you have bad failed set you have to remake your instance or (as is common in practice) restart your whole process, seem worse.

Do you think signals should automatically call this at python exit? It might smooth out the shutdown of some of the ipython sessions.

Great suggestion. Thanks. Will add now....

danielballan avatar Sep 02 '20 19:09 danielballan

Will add now....

... realizing that the current implementation will be changed from the current technique of putting black tape over the blinking red light. (That is, it sets the object reference to None.)

prjemian avatar Sep 02 '20 19:09 prjemian

Before, unclean exit because _set_thread is dangling:

In [1]: from ophyd import EpicsSignal                                                                                                                                                         

In [2]: s = EpicsSignal("stubborn:A", name='s')                                                                                                                                               

In [3]: s.set(1)                                                                                                                                                                              
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', timestamp=1599074325.10357, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)

In [4]: _.done                                                                                                                                                                                
Out[4]: False

In [5]: exit                                                                                                                                                                                  
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 272, in set_thread
    rtol=self.rtolerance, poison_pill=poison_pill)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 274, in set_and_wait
    current_value = signal.get()
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1117, in get
    self._read_pv, timeout, connection_timeout, as_string, form)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1075, in _get_with_timeout
    info = pv.get_with_metadata(as_string=as_string, form=form, timeout=timeout)
  File "/home/dallan/.conda/envs/py37/lib/python3.7/site-packages/epics/pv.py", line 46, in wrapped
    raise RuntimeError('Expected CA context is unset')
RuntimeError: Expected CA context is unset

After, clear_set called at exit by a weakref.finalizer gives a clear error message:

In [5]: exit         
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 274, in set_thread
    rtol=self.rtolerance, poison_pill=poison_pill)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1599075154.647881, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.

danielballan avatar Sep 02 '20 19:09 danielballan

My only reservation about adding this is that it might be used routinely

This is a very fair concern. In the past we've talked a lot about not leaving traps for our users.

I think this PR does a good job of making the failure at least very verbose with the warnings and exceptions raised, hopefully verbose enough that someone investigates the root cause of the issue rather than patching over it indefinitely.

In the context of "make it work for the user experiment" though, I think it's important to provide tools like this, else we end up doing some sort of kluge that makes the experiment work but is otherwise an accrual of technical debt. I'll take a standard bandaid over roll-your-own bandaid any day of the week.

ZLLentz avatar Sep 02 '20 20:09 ZLLentz

Well said.

danielballan avatar Sep 02 '20 23:09 danielballan

@danielballan : Your comment above shows exactly the same sort of error I was seeing recently. In that comment from USAXS, there is an attached text file with the thousands of lines output from the error dump on exiting the console session.

In a loop (with bps.repeat), I ran a plan that included six different bp.scan runs. Four of the scans involve a motor with the .STUP field as an added signal:

class GuardSlitMotor(EpicsMotor):
    status_update = Component(EpicsSignal, ".STUP")

To clear the problems previously reported, I include at the end of a set of scans a call to the yield from bps.abs_set(MOTOR.status_update, 1). The EPICS motor record receives that, does the status update in the motor record, and then sets .STUP back to zero. Ophyd is left waiting on the status_update signal so the status object for the .set() is waiting in a thread. To kill this thread, I call MOTOR.status_update._set_thread = None on each of these (4) motors. After some repetitions of this set of 6 scans, the time between starts of the set becomes noticeably longer as well as start times of each scan become longer, too. Observe in a plot that the longer this problem went, the slower the scan repeated itself.

I'm concerned that the fix of MOTOR.status_update._set_thread = None is not enough. I assume that what I see shown in the plot (cited in the details) as increasingly longer times to repeat the same plan is due to accumulated resource leakage with these abandoned threads.

prjemian avatar Sep 03 '20 02:09 prjemian

@prjemian unless I'm misreading something here, this PR ensures that the set thread will die using a threading.Event to send it a kill signal, either when requested by the user or upon finalization. So I think this fix as implemented will help with console exit error dump quite a bit.

ZLLentz avatar Sep 03 '20 03:09 ZLLentz

Right. In the earliest draft of this PR I just dropped the reference to the thread (“put black tape over the red warning light”) to prompt discussion on the question “Do we even want to expose this API?” But in the current draft I first ensure the thread will stop, as @ZLLentz describes, and only then drop the reference.

danielballan avatar Sep 03 '20 03:09 danielballan

Ah, I see the additional code now. Should I patch locally and give it a try? Seems I have a stress test demonstrated above.

prjemian avatar Sep 03 '20 22:09 prjemian

That would be great!

danielballan avatar Sep 03 '20 22:09 danielballan

Ok, NeXus conferences are done, back in the Bluesky saddle again. Tested with caproto IOC per above using this test code:

from ophyd import EpicsSignal

s = EpicsSignal("stubborn:A", name='s')
st = s.set(1)
st.done
st = s.set(2)
st.done
exit
  1. Verified same result when using ophyd/master branch (1.5.4) that same result was received:
  2. Using danielballan:escape-another-set-in-progress branch, did not get a clean exit:
(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ git branch
* escape-another-set-in-progress
  master
(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ ipython
Python 3.8.2 (default, Mar 26 2020, 15:53:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ophyd import EpicsSignal 
   ...:  
   ...: s = EpicsSignal("stubborn:A", name='s') 
   ...: st = s.set(1) 
   ...: st.done 
   ...: st = s.set(2) 
   ...: st.done 
   ...: exit 
   ...:                                                                                                                                                      
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-548872e6ae6b> in <module>
      4 st = s.set(1)
      5 st.done
----> 6 st = s.set(2)
      7 st.done
      8 exit

~/Documents/projects/Bluesky/ophyd/ophyd/signal.py in set(self, value, timeout, settle_time)
   1706         '''
   1707         if not self._put_complete:
-> 1708             return super().set(value, timeout=timeout, settle_time=settle_time)
   1709 
   1710         # using put completion:

~/Documents/projects/Bluesky/ophyd/ophyd/signal.py in set(self, value, timeout, settle_time)
    305 
    306         if self._set_thread is not None:
--> 307             raise RuntimeError(
    308                 f"Another set() call is still in progress for {self.name}. "
    309                 "If this is due to some transient failure, verify that the "

RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [2]: st.done 
   ...: exit 
   ...:                                                                                                                                                      
Out[2]: <IPython.core.autocall.ExitAutocall at 0x7f2acb4969a0>

In [3]: exit                                                                                                                                                 
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603738132.896743, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ 

FYI

(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ conda list ophyd
# packages in environment at /home/prjemian/Apps/anaconda/envs/bluesky_2020_9:
#
# Name                    Version                   Build  Channel
ophyd                     1.5.3.post4+g14a4fd4           dev_0    <develop>

prjemian avatar Oct 26 '20 18:10 prjemian

But, adding the call to clear_set() after the failed set() allows clean exit:

~/Documents/projects/Bluesky/ophyd/ophyd/signal.py in set(self, value, timeout, settle_time)
    305 
    306         if self._set_thread is not None:
--> 307             raise RuntimeError(
    308                 f"Another set() call is still in progress for {self.name}. "
    309                 "If this is due to some transient failure, verify that the "

RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [2]: s.clear_set()                                                                                                                                        
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603738591.861509, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py:334: UserWarning: A previous set() operation is being ignored. Only this do this when debugging or recovering from a hardware failure.
  warnings.warn(

In [3]: exit                                                                                                                                                 

prjemian avatar Oct 26 '20 18:10 prjemian

So, this code is noisy (can't suppress the comments) and it goes against the advice provided, BUT it continues to operate past the reported problems:

from ophyd import EpicsSignal
from ophyd.utils.epics_pvs import AbandonedSet

s = EpicsSignal("stubborn:A", name='s')

for cycle in range(3):
    st = s.set(1)
    print(f"cycle {cycle}, status: {st.done}")

    s.clear_set()
    print(f"cycle {cycle}, status: {st.done}")

s.clear_set()

exit()

ipython session:

(bluesky_2020_9) prjemian@poof /tmp $ ipython
Python 3.8.2 (default, Mar 26 2020, 15:53:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ophyd import EpicsSignal 
   ...: from ophyd.utils.epics_pvs import AbandonedSet 
   ...:  
   ...: s = EpicsSignal("stubborn:A", name='s') 
   ...:  
   ...: for cycle in range(3): 
   ...:     st = s.set(1) 
   ...:     print(f"cycle {cycle}, status: {st.done}") 
   ...:  
   ...:     s.clear_set() 
   ...:     print(f"cycle {cycle}, status: {st.done}") 
   ...:  
   ...: s.clear_set() 
   ...:  
   ...: exit() 
   ...:                                                                                                                                                      
cycle 0, status: False
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603747329.535178, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py:334: UserWarning: A previous set() operation is being ignored. Only this do this when debugging or recovering from a hardware failure.
  warnings.warn(
cycle 0, status: True
cycle 1, status: False
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603747329.537657, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
cycle 1, status: False
cycle 2, status: False
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603747329.539203, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
cycle 2, status: False
(bluesky_2020_9) prjemian@poof /tmp $ 

prjemian avatar Oct 26 '20 21:10 prjemian

Now for the stress test.

Code

#!/usr/bin/env python3

from ophyd import EpicsSignal
from ophyd.utils.epics_pvs import AbandonedSet
import time

s = EpicsSignal("sky:m1.STUP", name='s', string=True)

for cycle in range(3):
    print(">"*40)
    st = s.set("ON")
    print(f"cycle {cycle}, status: {st.done}")
    time.sleep(0.05)

    s.clear_set()
    print(f"cycle {cycle}, status: {st.done}")
    print("<"*40)

s.clear_set()

exit()

without s.clear_set()

(bluesky_2020_9) prjemian@poof /tmp $ ./test_motor.py 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 0, status: False
cycle 0, status: False
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Traceback (most recent call last):
  File "./test_motor.py", line 11, in <module>
    st = s.set("ON")
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 1708, in set
    return super().set(value, timeout=timeout, settle_time=settle_time)
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 307, in set
    raise RuntimeError(
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603748190.873643, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.

with s.clear_set()

(bluesky_2020_9) prjemian@poof /tmp $ ./test_motor.py 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 0, status: False
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603748220.059571, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.
/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py:334: UserWarning: A previous set() operation is being ignored. Only this do this when debugging or recovering from a hardware failure.
  warnings.warn(
cycle 0, status: False
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 1, status: False
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='BUSY', timestamp=1603748220.081061, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.
cycle 1, status: False
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 2, status: False
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 16:37:00.132979121
..................................................................
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='BUSY', timestamp=1603748220.081061, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.
cycle 2, status: True
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Conclusion

The stress test passes with this addition. It is definitely noisy (lots of unpreventable console output). But the primary problem is solved.

prjemian avatar Oct 26 '20 21:10 prjemian

Calling set() with an explicit timeout is less noisy:

In [30]: st = s.set("ON", timeout=0.1)                                                                                                                       

set_and_wait(value=ON, timeout=0.1, atol=None, rtol=None)
In [31]: st = s.set("ON", timeout=0.1)                                                                                                                       

set_and_wait(value=ON, timeout=0.1, atol=None, rtol=None)
In [32]: st = s.set("ON", timeout=0.1)                                                                                                                       

set_and_wait(value=ON, timeout=0.1, atol=None, rtol=None)

In the why didn't I think of that sooner category, isn't setting a timeout parameter the right way? Can the timeout=0.05 kwarg be used with bps.mv?

def plan9(signal):
    for _itr in range(3):
        try:
            yield from bps.mv(signal, "ON", timeout=0.01)
        except FailedStatus:
            pass

Yup, it works (if the exception is trapped):

In [13]: RE(plan9(s1))                                                                                                                                       
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 17:22:34.108184434
..................................................................
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
Out[13]: ()

prjemian avatar Oct 26 '20 22:10 prjemian

The result can be quieter by suppressing the warning-level messages from the signal:

In [24]: s1.log.setLevel("ERROR")                                                                                                                            

In [25]: RE(plan9(s1))                                                                                                                                       
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 17:28:45.578939646
..................................................................
Out[25]: ()

In [26]: s1.log.setLevel("WARNING")                                                                                                                          

In [27]: RE(plan9(s1))                                                                                                                                       
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 17:28:56.273481707
..................................................................
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
Out[27]: ()

In [28]:                                                                                                                                                     

prjemian avatar Oct 26 '20 22:10 prjemian

@danielballan @tacaswell I'm sensing victory here by adding a timeout kwarg to the bps.mv() calls for metadata. We can suppress warnings (via signal.log.setLevel()) if they become a nuisance.

prjemian avatar Oct 26 '20 22:10 prjemian

But now I ask the question why the timeout kwarg in EpicsSignal .set() method does not default to the .timeout value? Looking here: https://github.com/bluesky/ophyd/blob/3dddf111946c1eb7200361d2158445db2acfb76d/ophyd/signal.py#L1719

This .set() will default the timeout kwarg to None. Since I've set the default timeout for the signal to be 10 s, I was expecting s.timeout to be 10 (and it is).

(bluesky_2020_9) prjemian@poof /tmp $ ipython
Python 3.8.2 (default, Mar 26 2020, 15:53:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ophyd.signal import EpicsSignalBase                                                                                  

In [2]: EpicsSignalBase.set_default_timeout(timeout=10, connection_timeout=5)                                                     

In [3]: from ophyd import EpicsSignal                                                                                             

In [4]: s = EpicsSignal("sky:m1.STUP", name='s', string=True)                                                                     

In [5]: s.timeout                                                                                                                 
Out[5]: 10

Now, try st = s.set("ON") and wait 15 seconds (5 seconds beyond timeout) before printing the status. Expecting it be done.

import time
st = s.set("ON")
time.sleep(15)
print(st)

Expecting done, but no.

In [6]: import time 
   ...: st = s.set("ON") 
   ...: time.sleep(15) 
   ...: print(st) 
   ...:                                                                                                                           
Status(obj=EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603771932.302131, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False), done=False, success=False)

Repeat but use a timeout=10 kwarg with .set() in a new ipython session:

In [1]: from ophyd import EpicsSignal 
   ...: import time 
   ...: s = EpicsSignal("sky:m1.STUP", name='s', string=True) 
   ...: st = s.set("ON", timeout=10) 
   ...: time.sleep(15) 
   ...: print(st) 
   ...:                                                                                                                           
set_and_wait(value=ON, timeout=10, atol=None, rtol=None)
Status(obj=EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603772109.341146, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False), done=True, success=False)

prjemian avatar Oct 27 '20 04:10 prjemian

This seems :+1: to me. It adds a bunch of complexity, but I think that this is the minimum complexity we can have if we are going to have background status threads like this.

tacaswell avatar Oct 27 '20 16:10 tacaswell

This needs a rebase, and tests. (Note to self.) Otherwise it seems there is consensus for merging.

danielballan avatar Jan 12 '21 21:01 danielballan

"Another set() call is still in progress problem" in EpicsSignal.set function seems to occur especially when it is a transition type signal.

If 1 is set to xxx:m1.STUP, it changes in the following order: 0(OFF) -> 1 (On) -> 2 (BUSY) -> 0 (OFF).

xxx:m1.STUP                    2021-01-14 17:10:52.570557 OFF
xxx:m1.STUP                    2021-01-14 17:10:52.570557 ON
xxx:m1.STUP                    2021-01-14 17:11:01.923591 BUSY
xxx:m1.STUP                    2021-01-14 17:11:01.923837 OFF

Another transition type signal may change in the order of 0 -> 1 -> 0.

test:signal                    2021-01-14 16:43:21.773892 0
test:signal                    2021-01-14 16:43:23.363370 1
test:signal                    2021-01-14 16:43:23.364933 0

In this case, the transition time from 1 to 0 is only 1.56 ms. The while loop in the ophyd.utils.epics_pvs.set_and_wait function seems difficult to detect this change.

On camonitor, the set was successful, but the EpicsSignal.set function only sees 0 values after the transition ends. Therefore, I think this type of EpicsSignal should be handled specially.

Could you please review how this solution is?

https://github.com/physwkim/ophyd/commit/2a3b1c09c62ebf9fa5eb4807ec9e32221276edf6

physwkim avatar Jan 14 '21 09:01 physwkim

It would be better to use the put_complete option for the transition signal. signal = EpicsSignal("test:signal", put_complete=True)

physwkim avatar Jan 14 '21 14:01 physwkim

Thanks for pushing on this @prjemian .

danielballan avatar Jun 10 '21 17:06 danielballan

I'm stuck trying to trigger AbandonedSet raised in ophyd.utils.epics_pvs.set_and_wait

Ideas?

On Thu, Jun 10, 2021, 12:12 PM Dan Allan @.***> wrote:

Thanks for pushing on this @prjemian https://github.com/prjemian .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bluesky/ophyd/pull/901#issuecomment-858801584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMHKNIOYPOTWTP37XEDTSDW6DANCNFSM4QR7ZBJQ .

prjemian avatar Jun 10 '21 17:06 prjemian

I'm confused, .set() defines the poison_pill while .put() does not. Will there be troubles with .put()?

prjemian avatar Jun 10 '21 22:06 prjemian

Why is the CI taking 6 hours to run?

prjemian avatar Jun 11 '21 14:06 prjemian

Will there be troubles with .put()?

No, this is semi-internal. It operates one layer below Status objects so it should not have anything to do with "poison pill".

danielballan avatar Jun 18 '21 13:06 danielballan