sbtools icon indicating copy to clipboard operation
sbtools copied to clipboard

session_authorized, session_validate vs is_logged_in, is.null(session)||is_logged_in

Open aappling-usgs opened this issue 10 years ago • 7 comments

Initial motivation for this issue (but see below): wanted to export session_authorized because it's at least as useful as session_validate for clients.

aappling-usgs avatar Feb 15 '16 16:02 aappling-usgs

session_authorized(session) is very similar to is_logged_in(session). session_validate(session) is very similar to is.null(session) || is_logged_in(session).

Why have both?

The session_authorized and session_validate versions use stored information about the stored session birthdate and lifespan, whereas is_logged_in queries SB (via session_details) at https://www.sciencebase.gov/catalog/jossoHelper/sessionInfo. At this point, is one approach noticeably faster or more reliable than the other?

What's our internal policy for when to talk to SB versus when to use stored information about session status? What's the policy we want clients to follow? Does it make sense to offer both options, or would one approach suffice?

aappling-usgs avatar Feb 15 '16 16:02 aappling-usgs

Bringing some data to this. A bit surprisingly to me, session_authorized and session_validate are always slower than options involving is_logged_in, even though they skip the GET call. They are also potentially less reliable since they depend on our perfect understanding of how SB renews sessions.

library(microbenchmark)
library(dplyr)
library(sbtools)

# without login
session_logout()
bind_rows(
    summary(microbenchmark({session_validate()})),
    summary(microbenchmark({is.null(current_session()) || is_logged_in()})),
    summary(microbenchmark({session_authorized()})),
    summary(microbenchmark({!is.null(current_session()) && is_logged_in()})),
    summary(microbenchmark({is_logged_in()})))
#                                                    expr     min      lq     mean   median      uq      max neval
#                                                   (chr)   (dbl)   (dbl)    (dbl)    (dbl)   (dbl)    (dbl) (dbl)
# 1                            {     session_validate() }  5.3210  5.7020  7.46156  5.70200  6.0820  82.1030   100
# 2  {     is.null(current_session()) || is_logged_in() }  3.8010  3.8010  4.34089  4.18100  4.1810  29.6480   100
# 3                          {     session_authorized() }  7.6020  7.6020  8.16098  7.98200  7.9830  33.0690   100
# 4 {     !is.null(current_session()) && is_logged_in() }  3.8010  3.8010  4.24584  4.18100  4.1810  22.4270   100
# 5                                {     is_logged_in() } 77.3485 88.0165 94.43853 89.12602 90.9929 376.2634   100

# with login
authenticate_sb()
bind_rows(
    summary(microbenchmark({session_validate()})),
    summary(microbenchmark({is.null(current_session()) || is_logged_in()})),
    summary(microbenchmark({session_authorized()})),
    summary(microbenchmark({!is.null(current_session()) && is_logged_in()})),
    summary(microbenchmark({is_logged_in()})))
#                                                    expr       min        lq      mean    median        uq      max neval
#                                                   (chr)     (dbl)     (dbl)     (dbl)     (dbl)     (dbl)    (dbl) (dbl)
# 1                            {     session_validate() } 252.00900 263.98250 275.85294 270.44450 279.94650 592.9630   100
# 2  {     is.null(current_session()) || is_logged_in() }  76.89352  88.54693  96.21872  90.01775  91.79473 375.2698   100
# 3                          {     session_authorized() } 241.74600 258.09100 274.87217 265.31200 278.61650 525.3040   100
# 4 {     !is.null(current_session()) && is_logged_in() }  76.85741  80.07575  91.09422  86.13270  89.33736 352.5936   100
# 5                                {     is_logged_in() }  76.89200  79.25567  86.44213  80.34638  82.00934 355.1825   100

aappling-usgs avatar Feb 15 '16 17:02 aappling-usgs

I propose that we

  1. rewrite session_validate so to simply contain is.null(session) || is_logged_in(session) (faster)
  2. rename session_validate to session_valid (because it doesn't DO stuff, just gets info)
  3. remove session_authorized from the package (redundant with & slower than is_logged_in)
  4. rewrite is_logged_in to check for a NULL session before going to session_details, i.e., with !is.null(session) && session_details(..., session = session)$isLoggedIn (faster)

Thoughts, @lawinslow @sckott @jread-usgs ?

aappling-usgs avatar Feb 15 '16 18:02 aappling-usgs

Also want to confirm that is_logged_in detects a change in authorization status due to session timeout (using code that reflects the above changes 1, 3, and 4).

> library(sbtools)
> authenticate_sb()
> session_validate()
[1] TRUE
> is_logged_in()
[1] TRUE
> Sys.time()
[1] "2016-02-15 12:33:41 CST"
> Sys.time()
[1] "2016-02-15 12:42:35 CST"
> is_logged_in() # possibly renews session?
[1] TRUE
> # wait an hour since the call to authenticate_sb()
> Sys.time()
[1] "2016-02-15 13:38:21 CST"
> is_logged_in()
[1] FALSE
Warning message:
In current_session() :
  SB authentication expired, SB interaction may fail. Please re-authenticate using authenticate_sb().

One thing this demonstrates is that is_logged_in() does not itself renew the session. Good to know.

The other, main thing is that current_session() and is_logged_in() both notice, in different ways (internal records and SB records, respectively) that the session has expired. And they tell you this with a warning and a FALSE. We won't lose this info by switching from session_authorized to is_logged_in.

aappling-usgs avatar Feb 15 '16 19:02 aappling-usgs

I'm not sure what's going on with that benchmark. I got similar results using your above code with microbenchmark, but those results are way off.

> system.time(replicate(100, session_validate()))
   user  system elapsed 
   0.04    0.00    0.04 
> system.time(replicate(100, is_logged_in()))
   user  system elapsed 
   1.56    0.16   10.04 

using just system.time, I get results that make a lot more sense. is_logged_in takes two orders of magnitude more time. This is from my home connection where my RTT to the WAF (beyond which pings are dropped) is ~65ms, so your mileage may vary if you're internal USGS or further away.

Why have both? Good question. One could argue that you might have two different use-cases. 1) You want to catch easy to predict errors like a very old session. 2) You want an unequivocal way to check authorization, which is going directly to SB.

We certainly could drop all functions which guess the session state internally and replace them directly with round-trip to SB versions. This would roughly double the amount of time each request takes, increase the error rate (every HTTP transaction has a non-zero chance of random failure), and increase the certainty of our session state (no time and age tracking).

Alternatively, all functions could be kept as they are to support fast, non-round-trip checking of session state and also more direct, HTTP-query based checking.

I personally lean towards removing as much internal package state as possible. This will have the effect of reducing sbtools performance by about 2x (practically probably a less).

  1. rewrite session_validate so to simply contain is.null(session) || is_logged_in(session) (faster)
  2. rename session_validate to session_valid (because it doesn't DO stuff, just gets info)
  3. remove session_authorized from the package (redundant with & slower than is_logged_in)
  4. rewrite is_logged_in to check for a NULL session before going to session_details, i.e., with !is.null(session) && session_details(..., session = session)$isLoggedIn (faster)

I am on board with this plan. Let's let @jread-usgs and @sckott weigh in before we make these changes.

lawinslow avatar Feb 15 '16 21:02 lawinslow

@lawinslow , your new info on runtime makes me think differently about the problem - for a few hours it was a win-win in my mind, but now it's a tradeoff between speed and internal simplicity, with reliability costs on both sides. we do a validity check at the start of pretty much every function call, so i'm much less excited about removing the shortcuts (local checks) now that I know they're so much faster and have been reminded that they might not be any less reliable.

a third option (besides keeping everything as-is and moving to all round-trip checks), would be to

  1. name & document the functions so it's clear which uses local checks and which goes round-trip (or consolidate to two functions, for authorized and valid, that each have a flag for local vs round-trip),
  2. set a policy for internal use of local vs round-trip checks, and
  3. suggest a user policy for when to use local vs round-trip checks.

PS -- i figured out why microbenchmark appeared to give such contradictory results. it customizes the units to the results, and my bind_rows was wiping out the units differences. my bad.

# without login
if(is_logged_in()) session_logout()
no_login_mb2 <- microbenchmark(
  session_validate(),
  is.null(current_session()) || is_logged_in(),
  sbtools:::session_authorized(current_session()),
  !is.null(current_session()) && is_logged_in(),
  is_logged_in(),
  control=list(warmup=0))
no_login_mb2
#                                            expr       min         lq         mean     median        uq         max
#                              session_validate()    32.689    89.8945    102.78778    95.2160   124.864     163.065
#    is.null(current_session()) || is_logged_in()     6.462    13.6840     16.95275    15.5845    17.105      57.395
# sbtools:::session_authorized(current_session())    48.653   113.8415    129.48613   126.3850   158.503     222.361
#   !is.null(current_session()) && is_logged_in()     5.321    14.4440     19.10409    16.3450    18.245      68.039
#                                  is_logged_in() 72831.725 83512.4570 103084.88218 85064.4215 88206.552 1320398.774

# with login
authenticate_sb()
with_login_mb2 <- microbenchmark(
  session_validate(),
  is.null(current_session()) || is_logged_in(),
  sbtools:::session_authorized(current_session()),
  !is.null(current_session()) && is_logged_in(),
  is_logged_in(),
  control=list(warmup=0))
with_login_mb2
#                                            expr       min         lq       mean    median        uq        max neval
#                              session_validate()   317.006   886.0225   936.3330  1007.085  1062.201   1199.989   100
#    is.null(current_session()) || is_logged_in() 75082.321 84000.7010 89968.1929 85057.389 87438.361 354469.396   100
# sbtools:::session_authorized(current_session())   319.667   927.2640   992.9647  1078.545  1100.401   1239.899   100
#   !is.null(current_session()) && is_logged_in() 74680.551 83777.0095 96587.6658 85463.151 89560.102 367438.544   100
#                                  is_logged_in() 74521.667 83253.6065 91475.0430 84973.957 87737.883 455502.177   100

interested in input from all @lawinslow @jread-usgs @sckott

aappling-usgs avatar Feb 16 '16 01:02 aappling-usgs

@aappling-usgs -- do you have any recollection of whether this issue really needs to still be open? It just cought my eye and I found it curious that it's open.

dblodgett-usgs avatar Nov 03 '22 03:11 dblodgett-usgs