session_authorized, session_validate vs is_logged_in, is.null(session)||is_logged_in
Initial motivation for this issue (but see below):
wanted to export session_authorized because it's at least as useful as session_validate for clients.
session_authorized(session) is very similar to is_logged_in(session).
session_validate(session) is very similar to is.null(session) || is_logged_in(session).
Why have both?
The session_authorized and session_validate versions use stored information about the stored session birthdate and lifespan, whereas is_logged_in queries SB (via session_details) at https://www.sciencebase.gov/catalog/jossoHelper/sessionInfo. At this point, is one approach noticeably faster or more reliable than the other?
What's our internal policy for when to talk to SB versus when to use stored information about session status? What's the policy we want clients to follow? Does it make sense to offer both options, or would one approach suffice?
Bringing some data to this. A bit surprisingly to me, session_authorized and session_validate are always slower than options involving is_logged_in, even though they skip the GET call. They are also potentially less reliable since they depend on our perfect understanding of how SB renews sessions.
library(microbenchmark)
library(dplyr)
library(sbtools)
# without login
session_logout()
bind_rows(
summary(microbenchmark({session_validate()})),
summary(microbenchmark({is.null(current_session()) || is_logged_in()})),
summary(microbenchmark({session_authorized()})),
summary(microbenchmark({!is.null(current_session()) && is_logged_in()})),
summary(microbenchmark({is_logged_in()})))
# expr min lq mean median uq max neval
# (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1 { session_validate() } 5.3210 5.7020 7.46156 5.70200 6.0820 82.1030 100
# 2 { is.null(current_session()) || is_logged_in() } 3.8010 3.8010 4.34089 4.18100 4.1810 29.6480 100
# 3 { session_authorized() } 7.6020 7.6020 8.16098 7.98200 7.9830 33.0690 100
# 4 { !is.null(current_session()) && is_logged_in() } 3.8010 3.8010 4.24584 4.18100 4.1810 22.4270 100
# 5 { is_logged_in() } 77.3485 88.0165 94.43853 89.12602 90.9929 376.2634 100
# with login
authenticate_sb()
bind_rows(
summary(microbenchmark({session_validate()})),
summary(microbenchmark({is.null(current_session()) || is_logged_in()})),
summary(microbenchmark({session_authorized()})),
summary(microbenchmark({!is.null(current_session()) && is_logged_in()})),
summary(microbenchmark({is_logged_in()})))
# expr min lq mean median uq max neval
# (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1 { session_validate() } 252.00900 263.98250 275.85294 270.44450 279.94650 592.9630 100
# 2 { is.null(current_session()) || is_logged_in() } 76.89352 88.54693 96.21872 90.01775 91.79473 375.2698 100
# 3 { session_authorized() } 241.74600 258.09100 274.87217 265.31200 278.61650 525.3040 100
# 4 { !is.null(current_session()) && is_logged_in() } 76.85741 80.07575 91.09422 86.13270 89.33736 352.5936 100
# 5 { is_logged_in() } 76.89200 79.25567 86.44213 80.34638 82.00934 355.1825 100
I propose that we
- rewrite
session_validateso to simply containis.null(session) || is_logged_in(session)(faster) - rename
session_validatetosession_valid(because it doesn't DO stuff, just gets info) - remove
session_authorizedfrom the package (redundant with & slower thanis_logged_in) - rewrite
is_logged_into check for a NULL session before going to session_details, i.e., with!is.null(session) && session_details(..., session = session)$isLoggedIn(faster)
Thoughts, @lawinslow @sckott @jread-usgs ?
Also want to confirm that is_logged_in detects a change in authorization status due to session timeout (using code that reflects the above changes 1, 3, and 4).
> library(sbtools)
> authenticate_sb()
> session_validate()
[1] TRUE
> is_logged_in()
[1] TRUE
> Sys.time()
[1] "2016-02-15 12:33:41 CST"
> Sys.time()
[1] "2016-02-15 12:42:35 CST"
> is_logged_in() # possibly renews session?
[1] TRUE
> # wait an hour since the call to authenticate_sb()
> Sys.time()
[1] "2016-02-15 13:38:21 CST"
> is_logged_in()
[1] FALSE
Warning message:
In current_session() :
SB authentication expired, SB interaction may fail. Please re-authenticate using authenticate_sb().
One thing this demonstrates is that is_logged_in() does not itself renew the session. Good to know.
The other, main thing is that current_session() and is_logged_in() both notice, in different ways (internal records and SB records, respectively) that the session has expired. And they tell you this with a warning and a FALSE. We won't lose this info by switching from session_authorized to is_logged_in.
I'm not sure what's going on with that benchmark. I got similar results using your above code with microbenchmark, but those results are way off.
> system.time(replicate(100, session_validate()))
user system elapsed
0.04 0.00 0.04
> system.time(replicate(100, is_logged_in()))
user system elapsed
1.56 0.16 10.04
using just system.time, I get results that make a lot more sense. is_logged_in takes two orders of magnitude more time. This is from my home connection where my RTT to the WAF (beyond which pings are dropped) is ~65ms, so your mileage may vary if you're internal USGS or further away.
Why have both? Good question. One could argue that you might have two different use-cases. 1) You want to catch easy to predict errors like a very old session. 2) You want an unequivocal way to check authorization, which is going directly to SB.
We certainly could drop all functions which guess the session state internally and replace them directly with round-trip to SB versions. This would roughly double the amount of time each request takes, increase the error rate (every HTTP transaction has a non-zero chance of random failure), and increase the certainty of our session state (no time and age tracking).
Alternatively, all functions could be kept as they are to support fast, non-round-trip checking of session state and also more direct, HTTP-query based checking.
I personally lean towards removing as much internal package state as possible. This will have the effect of reducing sbtools performance by about 2x (practically probably a less).
- rewrite
session_validateso to simply containis.null(session) || is_logged_in(session)(faster)- rename
session_validatetosession_valid(because it doesn't DO stuff, just gets info)- remove
session_authorizedfrom the package (redundant with & slower thanis_logged_in)- rewrite
is_logged_into check for a NULL session before going to session_details, i.e., with!is.null(session) && session_details(..., session = session)$isLoggedIn(faster)
I am on board with this plan. Let's let @jread-usgs and @sckott weigh in before we make these changes.
@lawinslow , your new info on runtime makes me think differently about the problem - for a few hours it was a win-win in my mind, but now it's a tradeoff between speed and internal simplicity, with reliability costs on both sides. we do a validity check at the start of pretty much every function call, so i'm much less excited about removing the shortcuts (local checks) now that I know they're so much faster and have been reminded that they might not be any less reliable.
a third option (besides keeping everything as-is and moving to all round-trip checks), would be to
- name & document the functions so it's clear which uses local checks and which goes round-trip (or consolidate to two functions, for authorized and valid, that each have a flag for local vs round-trip),
- set a policy for internal use of local vs round-trip checks, and
- suggest a user policy for when to use local vs round-trip checks.
PS -- i figured out why microbenchmark appeared to give such contradictory results. it customizes the units to the results, and my bind_rows was wiping out the units differences. my bad.
# without login
if(is_logged_in()) session_logout()
no_login_mb2 <- microbenchmark(
session_validate(),
is.null(current_session()) || is_logged_in(),
sbtools:::session_authorized(current_session()),
!is.null(current_session()) && is_logged_in(),
is_logged_in(),
control=list(warmup=0))
no_login_mb2
# expr min lq mean median uq max
# session_validate() 32.689 89.8945 102.78778 95.2160 124.864 163.065
# is.null(current_session()) || is_logged_in() 6.462 13.6840 16.95275 15.5845 17.105 57.395
# sbtools:::session_authorized(current_session()) 48.653 113.8415 129.48613 126.3850 158.503 222.361
# !is.null(current_session()) && is_logged_in() 5.321 14.4440 19.10409 16.3450 18.245 68.039
# is_logged_in() 72831.725 83512.4570 103084.88218 85064.4215 88206.552 1320398.774
# with login
authenticate_sb()
with_login_mb2 <- microbenchmark(
session_validate(),
is.null(current_session()) || is_logged_in(),
sbtools:::session_authorized(current_session()),
!is.null(current_session()) && is_logged_in(),
is_logged_in(),
control=list(warmup=0))
with_login_mb2
# expr min lq mean median uq max neval
# session_validate() 317.006 886.0225 936.3330 1007.085 1062.201 1199.989 100
# is.null(current_session()) || is_logged_in() 75082.321 84000.7010 89968.1929 85057.389 87438.361 354469.396 100
# sbtools:::session_authorized(current_session()) 319.667 927.2640 992.9647 1078.545 1100.401 1239.899 100
# !is.null(current_session()) && is_logged_in() 74680.551 83777.0095 96587.6658 85463.151 89560.102 367438.544 100
# is_logged_in() 74521.667 83253.6065 91475.0430 84973.957 87737.883 455502.177 100
interested in input from all @lawinslow @jread-usgs @sckott
@aappling-usgs -- do you have any recollection of whether this issue really needs to still be open? It just cought my eye and I found it curious that it's open.