jwql icon indicating copy to clipboard operation
jwql copied to clipboard

fixes for no data found on server

Open penaguerrero opened this issue 2 years ago • 14 comments

penaguerrero avatar Sep 15 '22 20:09 penaguerrero

After lots of investigation, the apparent cause behind the TA monitors reporting that they are not finding any data is a problem with the log_info decorator function. When I comment that out, the TA monitor successfully produces plots. Within log_info, it looks like the troublesome line is the environment = subprocess.check_output line. When I run the line manually on my local machine and on the server, the results seem fine. But when I run the TA monitor by opening the webpage, log_info() fails in such a way that the TA monitor's run() function is skipped. When I set environment to a dummy string within log_info(), everything works and the TA monitor produces plots. And when I wrap the environment line in a try/except, everything also works fine, because it falls into the except clause.

I've also noticed that the MSATA monitor is not creating a new log file at each run, but seems to be overwriting the most recent log file. In my last run via the webpage, the log was placed in a log file with a timestamp that was an hour old.

bhilbert4 avatar Sep 21 '22 15:09 bhilbert4

So it seems like it comes down to some issue with running conda env export when called from the webpage?

bhilbert4 avatar Sep 21 '22 15:09 bhilbert4

So it seems like it comes down to some issue with running conda env export when called from the webpage?

Thank you SO much for helping with this Bryan! If I understand correctly, this is a subprocess that is started from the java code to update the MSATA/WATA webpage from the magnifying glass to the plots. So this environment is expected to be the same as the original one but this sub environment does not have something that the code needs?

penaguerrero avatar Sep 21 '22 15:09 penaguerrero

The subprocess is called by log_info(), which is called when the msata run() function is called. My expectation is that this should just be running conda env export on the environment on the server. It's not at all clear to me what the problem is, since I can make that call successfully on the server in an ipython session.

I'm also curious about why we haven't seen this before. It is true that the other monitors are not completing successfully at the moment due to the memory errors, but I think they are still being called, and they should reach their log_info-wrapped functions before crashing. Or maybe we're not actually calling them at the moment?

@mfixstsci do you have any thoughts about this? Depending on how much effort we want to put into solving this, maybe we should wrap that environment line in a try/except in this PR, and worry about solving the issue in a separate PR?

bhilbert4 avatar Sep 21 '22 16:09 bhilbert4

Another piece to add to the puzzle: I checked the log file for yesterday's dark monitor run on the test server. The environment was successfully retrieved there. So what's different about the dark monitor compared to the TA monitors in this case?

bhilbert4 avatar Sep 22 '22 13:09 bhilbert4

This was in a recent log file, just prior to the environment problem:

   WARNING: [Errno 2] No such file or directory: 'conda': 'conda'

It's not clear to me why conda would not be present.

Here are the lines that I added to log_info during testing to get around the error:

        try:                                                                                            
            environment = subprocess.check_output('conda env export', universal_newlines=True, shell=True) # nosec                                                                                              
        except Exception as e:
            logging.warning(e)
            environment = 'Unable to get environment info\n'

bhilbert4 avatar Sep 22 '22 13:09 bhilbert4

The only difference I can think of between the TA monitors and the other monitors is that the others are all called on the command line in our crontab, while the TA monitors are called as part of the webpage creation process. Could that be important? @mfixstsci @BradleySappington

bhilbert4 avatar Sep 22 '22 14:09 bhilbert4

  • [x] Line 246 of monitor_views should be module = 'msata_monitor' rather than module = 'msata_monitor.py'

bhilbert4 avatar Sep 22 '22 14:09 bhilbert4

I just did a new commit to remove the ".py" from the module name in the script monitor_views.py.

penaguerrero avatar Sep 22 '22 14:09 penaguerrero

  • [x] Line 246 of monitor_views should be module = 'msata_monitor' rather than module = 'msata_monitor.py'

Nailed it, good catch @bhilbert4

mfixstsci avatar Sep 23 '22 15:09 mfixstsci

I think there's just one more change this needs and then it'll be ready to go. I think the best solution at the moment to get around the logging error is to add the try/except clause in the comment above to the logging function. That should allow us to continue to get environment info for the monitors that are run via the command line, and will allow us to gracefully skip collecting that info for the TA monitors that are being called via clicks on the webpage. @penaguerrero

bhilbert4 avatar Sep 26 '22 13:09 bhilbert4

@bhilbert4 I added the try/except block at line 242 of utils/logging_functions.py

penaguerrero avatar Sep 26 '22 15:09 penaguerrero

@mfixstsci @bhilbert4 I just added some tests for the monitors :)

penaguerrero avatar Sep 26 '22 21:09 penaguerrero

@penaguerrero THANK YOU FOR WRITING TESTS!

mfixstsci avatar Sep 26 '22 21:09 mfixstsci