mdsplus icon indicating copy to clipboard operation
mdsplus copied to clipboard

TreeGetRecord() returns a misleading error code

Open mwinkel-dev opened this issue 1 year ago • 4 comments

Affiliation MIT PSFC

Version(s) Affected Found in recent "alpha", but surely exists in all other versions too

Platform Found on Ubuntu 20, but is undoubtedly on all platforms

Describe the bug The TreeGetRecord() function in TreeGetRecord.c of the treeshr library returns a zero error code in some instances. When the zero error code is displayed as a text message it appears as %SS-W-SUCCESS, Success. Note that the "W" denotes a "Warning", which contradicts the status of "Success". And because the severity level is not set to success (i.e., low order bit is not set), the calling code will instead treat this as a failure. (The STATUS_OK macro checks the low-order bit.)

The questionable construct is as follows:

status = TreeCallHook( <args> )
if (status && STATUS_NOT_OK)
  return 0;

And it appears at these locations: line 100 line 114

To Reproduce Create a bug.py file with the following Python code.

import MDSplus as mds
import concurrent.futures as cf

def get_data(path):
    return tree.getNode(path).getData().data()

# Main program -- bug not specific to this shot and signals
shot_num = 1090909009
paths = [
    "\CMOD::TOP.ENGINEERING.POWER_SYSTEM:TEST",
    "\CMOD::TOP.ENGINEERING.RGA_1:AFTER",
    "\CMOD::TOP.ENGINEERING.RGA_1:BEFORE",
    "\CMOD::TOP.SPECTROSCOPY.BOLOMETER:TWOPI_DIODE",
    "\CMOD::TOP.SPECTROSCOPY.BOLOMETER:TWOPI_FOIL"
]
tree = mds.Tree("cmod", shot_num, mode="readonly")

# Works with one thread, fails with multiple threads
with cf.ThreadPoolExecutor(max_workers=2) as executor:
    futures = [executor.submit(get_data, node_path) for node_path in paths]
    for future in cf.as_completed(futures):
        print(future.result())

Running the above bug.py program produces the following output. The XMW message is from a debug print statement that was added to the python/MDSplus/tree.py file in the getRecord() function prior to raising the exception. Notice the %SS-W-SUCCESS, Success message at the bottom of the output.

python3 bug.py      
Error in SendArg: mode = 6, status = 65554
XMW tree.py, getRecord() before raise exception, status =  0
Error in SendArg: mode = 6, status = 65554
XMW tree.py, getRecord() before raise exception, status =  0
Error in SendArg: mode = 6, status = 65554
XMW tree.py, getRecord() before raise exception, status =  0
D, 1832925:1832943, 1698942711.128151250: <path>/mdsplus/mdstcpip/mdsipshr/MdsIpThreadStatic.c:48 buffer_free() Connection(id=2, state=0x80, protocol='tcp', info_name='tcp', version=3, user='(null)')
D, 1832925:1832944, 1698942711.133187987: <path>/mdsplus/mdstcpip/mdsipshr/MdsIpThreadStatic.c:48 buffer_free() Connection(id=3, state=0x80, protocol='tcp', info_name='tcp', version=3, user='(null)')
Traceback (most recent call last):
  File "bug.py", line 25, in <module>
    print(future.result())
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "bug.py", line 7, in get_data
    return tree.getNode(path).getData().data()
  File "/home/mwinkel/x_markw/xmw_build/python/MDSplus/tree.py", line 1976, in getRecord
    raise _exc.MDSplusException(status)
MDSplus.mdsExceptions.SsSUCCESS: %SS-W-SUCCESS, Success
Exception ignored in: <function DescriptorXD.__del__ at 0x7fa3f4539a60>
Traceback (most recent call last):

Expected behavior Because "tree hooks" are optional user-provided routines, the intent of the code is probably to ignore a missing "tree hook" and just return success. In which case it should instead return the official code for success, SsSUCCESS. This is important because SsSUCCESS has the low order bit set (i.e., severity level = 1), which is what the STATUS_OK macro checks.

However, if the intent was instead to signal failure, then the code should be written as return status or return MDSplusERROR.

Screenshots n/a

Additional context Thanks to Josh L. of the MIT PSFC Disruptions Team for reporting this bug.

mwinkel-dev avatar Nov 02 '23 17:11 mwinkel-dev

It would likely be useful to add a new error code for zero. This would have to be added to the Ss facility section of the mdsshr_messages.xml file. Perhaps it should be as follows:

<status name="ZERO"  value="0"  severity="Warning" text="Zero is an invalid error code"/>

Note that the resulting error code, SsZERO (= 0) differs from SsSUCCESS (= 1) only in the severity field.

However, this would likely be problematic -- because then both "SsSUCCESS" and "SsZERO" would have the same "value" field (the message system likely doesn't allow duplicates). And furthermore, the MdsGetStdMsg.c file of the mdsshr library would need to be modified to handle the special SsZERO error code.

mwinkel-dev avatar Nov 02 '23 18:11 mwinkel-dev

The reported bug is probably related to the broken (especially python) thread safety (especially mdsip)

Regardless:

Off the top of my head, it should not return zero. The bits are encoded, with the last two bits holding the severity. The only reason Success has no other bits set is that we did not want to call out a particular kind of success.

joshStillerman avatar Nov 02 '23 19:11 joshStillerman

It may or may not do what you need but I think that: ProcessPoolExecutor will be more likely to succeed.

joshStillerman avatar Nov 02 '23 20:11 joshStillerman

looks like it should simply returrn status instead of 0.

zack-vii avatar Nov 02 '23 20:11 zack-vii