opentripplanner
opentripplanner copied to clipboard
Ideas for next version
Holding place for ideas of what could be added to the package.
-
[ ] Support for OTP 2.0
-
[x] Check Java version function
-
[ ] Perfect elevation transcription (currently just mostly good)
-
[x] More helper functions e.g. travel time matrix
-
[ ] Support wider set of API features, e.g. GTFS details
-
[x] Cache JAR files in package
- [x] Impoved load balancing on mulitple cores, needs benchmarking
- Maybe enable selection of the correct Java version if more than one is installed on the system? -- It would be great to be able to select which version of Java to use from inside the package so that OTP will run if you have more than just 8 installed. (E.g. I have 8 and 11.). I wonder if the feature could be set up so it's done manually in the otp_set_up function or if the code could try behind the scenes to select the correct version with a series of if conditions and commands. On my Mac, I could use "export JAVA_HOME=
/usr/libexec/java_home -v 1.8" in the Terminal, but that's only for that single session. I tried change the .zprofile to set a new default, but the Java check still fails so I ended up just uninstalling 11 to run OTP from the package.
@jamesdeweese good suggestion, but I'm not sure how to make this happen from R, any suggestions welcome
@mem48 I'm just learning a lot of this, but I'm anxious to contribute to the package since it's been really useful to me. I could definitely work on figuring out what it might look like as a standalone script. (I think I've got at least a vague notion of how I might be able to get it to work on a Mac.) Once I've got the code working, I could hand it off to you. Or, if you'd be willing to help me figure out how to incorporate into your code, I could try my hand at that too. It'll be at least a couple weeks before I can sit down and really focus on it. But I'd love to try.
Give it a go and let me know. Don't worry too much about integrating it into the package.
The key bit of code is:
text <- paste0(
"java -Xmx",
memory,
'M -jar "',
otp,
'" --build "',
dir,
"/graphs/",
router,
'"'
)
set_up <- try(system(text, intern = TRUE))
I'm late to the party, but I wanted to follow up on https://github.com/eddelbuettel/rcppsimdjson/issues/47 so "Ideas for next version" seemed like the appropriate place.
It looks like you've already begun experimenting with RcppSimdJson, but please reach out if you'd like assistance with this. That could be as simple as asking for another set of eyes to optimize any inclusion or as complex as assistance in mapping any JSON to the desired R objects at the C++ level.
It's not just fast, it's extremely flexible. If you know precisely what data you want to extract (which seems like it would be the case here), a targeted query= can be dramatically faster than a full parse as you don't need to materialize all the data as R objects (which is the real performance bottleneck).
As of 0.1.1, you can use multiple queries simultaneously, so it should also simplify any post-parse processing.
json_chr <- c(opentripplanner::json_example_transit,
opentripplanner::json_example_long_drive,
opentripplanner::json_example_drive)
names(json_chr) <- c("transit", "long_drive", "drive")
res <- microbenchmark::microbenchmark(
full_parse = full_parse <- RcppSimdJson::fparse(json_chr),
just_elevationMetadata = itineraries <- RcppSimdJson::fparse(json_chr, query = "elevationMetadata")
,
times = 10
)
print(res, order = "median")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> just_elevationMetadata 230.4 233.2 261.85 240.55 271.1 363.4 10
#> full_parse 951.9 972.1 11167.46 1005.75 1027.8 102695.8 10
itineraries <- RcppSimdJson::fparse(json_chr, query = "plan/itineraries")
lapply(itineraries, tibble::as_tibble)
#> $transit
#> # A tibble: 3 x 14
#> duration startTime endTime walkTime transitTime waitingTime walkDistance
#> <int> <dbl> <dbl> <int> <int> <int> <dbl>
#> 1 2017 1.60e12 1.60e12 1175 840 2 1345.
#> 2 2017 1.60e12 1.60e12 1175 840 2 1345.
#> 3 2017 1.60e12 1.60e12 1175 840 2 1345.
#> # ... with 7 more variables: walkLimitExceeded <lgl>, elevationLost <dbl>,
#> # elevationGained <dbl>, transfers <int>, fare <list>, legs <list>,
#> # tooSloped <lgl>
#>
#> $long_drive
#> # A tibble: 1 x 14
#> duration startTime endTime walkTime transitTime waitingTime walkDistance
#> <int> <dbl> <dbl> <int> <int> <int> <dbl>
#> 1 8225 1.60e12 1.60e12 8225 0 0 0
#> # ... with 7 more variables: walkLimitExceeded <lgl>, elevationLost <dbl>,
#> # elevationGained <dbl>, transfers <int>, fare <list>, legs <list>,
#> # tooSloped <lgl>
#>
#> $drive
#> # A tibble: 1 x 13
#> duration startTime endTime walkTime transitTime waitingTime walkDistance
#> <int> <dbl> <dbl> <int> <int> <int> <dbl>
#> 1 1718 1.60e12 1.60e12 1718 0 0 0
#> # ... with 6 more variables: walkLimitExceeded <lgl>, elevationLost <dbl>,
#> # elevationGained <dbl>, transfers <int>, legs <list>, tooSloped <lgl>
Thanks for this I'll definitely give it a try, there are multiple parts of the json file required but not 100% so should be some performance gains to be had. I think I'm approaching diminishing returns as the parsing of the results is now much faster than waiting for otp to produce the result
@knapply I tried your idea
> bench::mark(r1 = RcppSimdJson::fparse(opentripplanner::json_example_transit),
+ r2 = RcppSimdJson::fparse(opentripplanner::json_example_transit, query = "plan/itineraries"), check = FALSE)
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 r1 762us 824us 920. 14.2KB 2.79 330 1 359ms <NULL> <Rprofmem[,~ <bch:t~ <tibble~
2 r2 745us 817us 674. 14.2KB 2.04 330 1 490ms <NULL> <Rprofmem[,~ <bch:t~ <tibble~
As I thought only a very small performance boost, as the "plan/itineraries" section I need is 90% of the whole JSON file. But it will also allow me to remove a few lines of R code so a boost worth having.
Are their plans for a write json function? At the moment I have to keep rjson as a package dependency so I can write and read json. It would be nice to reduce the dependencies if possible.
Well, I'm glad it may be somewhat helpful!
Are their plans for a write json function?
The simdjson library itself doesn't yet support serialization, but it is on their "post 1.0" agenda: https://github.com/simdjson/simdjson/issues/821.
I wouldn't expect it anytime soon, but I'm sure we'll add support for it when they get around to it upstream.
It's been a while, but I finally sort of figured out how to adjust the Java version that the package uses ... I still haven't figured out how to incorporate it into the package's code directly (and I've only tested this with success on a Mac), but if you find the path to your Java 8 version by running :
/usr/libexec/java_home -V
in the Terminal, you can then set your options in R studio with the following:
Sys.setenv(JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_251.jdk/Contents/Home")
As long as you do that before loading the library, opentripplanner will pick up the right version. I'm not quite sure how to incorporate that into the flow of the package since you need to change the option before loading. Maybe it can be reloaded?
- [x] Surface API http://docs.opentripplanner.org/en/v1.5.0/Surface/
@jamesdeweese I'm not sure what progress you have made on this effort, but I spent the afternoon trying to get R to recognize an older version of Java without uninstalling the newer versions I had. It was quite an adventure, with a simple solution: just set the JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home" in the .Renviron file for the project.
More details here.
I know that I'm late to this discussion @gregmacfarlane @jamesdeweese. But this package is being so useful for me that I wanted to contribute with my grain of sand.
You can select the the Java version for Windows interactively for the current R session with Sys.setenv()
So I added the following code to my scripts
#check the current Java version
otp_check_java(1.5)
system2("java", "-version")
#get the full PATH environment
pth <- Sys.getenv("PATH")
# Change the current java version to the desired version in the pth string.
# Here you need the folder name of the desired version.
pth2 <- gsub(pattern = "jdk-11.0.14",
replacement = "jre1.8.0_361",
x = pth)
#set the new PATH environment with the new string
Sys.setenv("PATH"=pth2)
I know it is pretty basic, I'm not a developer but this is working for my when testing different versions of OTP and other packages requiring sepcific JAVA versions
It is a good suggestion, the problem I've had is getting it to work reliably on different OSes and setups. CRAN has strict policies on R packages working well on all computers.
True. Maybe it is better to include it on instructions/tutorials.