opentripplanner icon indicating copy to clipboard operation
opentripplanner copied to clipboard

Ideas for next version

Open mem48 opened this issue 4 years ago • 15 comments

Holding place for ideas of what could be added to the package.

  • [ ] Support for OTP 2.0

  • [x] Check Java version function

  • [ ] Perfect elevation transcription (currently just mostly good)

  • [x] More helper functions e.g. travel time matrix

  • [ ] Support wider set of API features, e.g. GTFS details

  • [x] Cache JAR files in package

mem48 avatar Feb 25 '20 17:02 mem48

  • [x] Impoved load balancing on mulitple cores, needs benchmarking

mem48 avatar Feb 25 '20 17:02 mem48

  • Maybe enable selection of the correct Java version if more than one is installed on the system? -- It would be great to be able to select which version of Java to use from inside the package so that OTP will run if you have more than just 8 installed. (E.g. I have 8 and 11.). I wonder if the feature could be set up so it's done manually in the otp_set_up function or if the code could try behind the scenes to select the correct version with a series of if conditions and commands. On my Mac, I could use "export JAVA_HOME=/usr/libexec/java_home -v 1.8" in the Terminal, but that's only for that single session. I tried change the .zprofile to set a new default, but the Java check still fails so I ended up just uninstalling 11 to run OTP from the package.

jamesdeweese avatar Jul 01 '20 19:07 jamesdeweese

@jamesdeweese good suggestion, but I'm not sure how to make this happen from R, any suggestions welcome

mem48 avatar Jul 02 '20 14:07 mem48

@mem48 I'm just learning a lot of this, but I'm anxious to contribute to the package since it's been really useful to me. I could definitely work on figuring out what it might look like as a standalone script. (I think I've got at least a vague notion of how I might be able to get it to work on a Mac.) Once I've got the code working, I could hand it off to you. Or, if you'd be willing to help me figure out how to incorporate into your code, I could try my hand at that too. It'll be at least a couple weeks before I can sit down and really focus on it. But I'd love to try.

jamesdeweese avatar Jul 02 '20 22:07 jamesdeweese

Give it a go and let me know. Don't worry too much about integrating it into the package.

The key bit of code is:

text <- paste0(
    "java -Xmx",
    memory,
    'M -jar "',
    otp,
    '" --build "',
    dir,
    "/graphs/",
    router,
    '"'
  )
set_up <- try(system(text, intern = TRUE))

mem48 avatar Jul 02 '20 22:07 mem48

I'm late to the party, but I wanted to follow up on https://github.com/eddelbuettel/rcppsimdjson/issues/47 so "Ideas for next version" seemed like the appropriate place.

It looks like you've already begun experimenting with RcppSimdJson, but please reach out if you'd like assistance with this. That could be as simple as asking for another set of eyes to optimize any inclusion or as complex as assistance in mapping any JSON to the desired R objects at the C++ level.

It's not just fast, it's extremely flexible. If you know precisely what data you want to extract (which seems like it would be the case here), a targeted query= can be dramatically faster than a full parse as you don't need to materialize all the data as R objects (which is the real performance bottleneck).

As of 0.1.1, you can use multiple queries simultaneously, so it should also simplify any post-parse processing.

json_chr <- c(opentripplanner::json_example_transit,
              opentripplanner::json_example_long_drive,
              opentripplanner::json_example_drive)
names(json_chr) <- c("transit", "long_drive", "drive")

res <- microbenchmark::microbenchmark(
  full_parse = full_parse <- RcppSimdJson::fparse(json_chr),
  just_elevationMetadata = itineraries <- RcppSimdJson::fparse(json_chr, query = "elevationMetadata")
  ,
  times = 10
)
print(res, order = "median")
#> Unit: microseconds
#>                    expr   min    lq     mean  median     uq      max neval
#>  just_elevationMetadata 230.4 233.2   261.85  240.55  271.1    363.4    10
#>              full_parse 951.9 972.1 11167.46 1005.75 1027.8 102695.8    10

itineraries <- RcppSimdJson::fparse(json_chr, query = "plan/itineraries")
lapply(itineraries, tibble::as_tibble)
#> $transit
#> # A tibble: 3 x 14
#>   duration startTime endTime walkTime transitTime waitingTime walkDistance
#>      <int>     <dbl>   <dbl>    <int>       <int>       <int>        <dbl>
#> 1     2017   1.60e12 1.60e12     1175         840           2        1345.
#> 2     2017   1.60e12 1.60e12     1175         840           2        1345.
#> 3     2017   1.60e12 1.60e12     1175         840           2        1345.
#> # ... with 7 more variables: walkLimitExceeded <lgl>, elevationLost <dbl>,
#> #   elevationGained <dbl>, transfers <int>, fare <list>, legs <list>,
#> #   tooSloped <lgl>
#> 
#> $long_drive
#> # A tibble: 1 x 14
#>   duration startTime endTime walkTime transitTime waitingTime walkDistance
#>      <int>     <dbl>   <dbl>    <int>       <int>       <int>        <dbl>
#> 1     8225   1.60e12 1.60e12     8225           0           0            0
#> # ... with 7 more variables: walkLimitExceeded <lgl>, elevationLost <dbl>,
#> #   elevationGained <dbl>, transfers <int>, fare <list>, legs <list>,
#> #   tooSloped <lgl>
#> 
#> $drive
#> # A tibble: 1 x 13
#>   duration startTime endTime walkTime transitTime waitingTime walkDistance
#>      <int>     <dbl>   <dbl>    <int>       <int>       <int>        <dbl>
#> 1     1718   1.60e12 1.60e12     1718           0           0            0
#> # ... with 6 more variables: walkLimitExceeded <lgl>, elevationLost <dbl>,
#> #   elevationGained <dbl>, transfers <int>, legs <list>, tooSloped <lgl>

knapply avatar Aug 12 '20 14:08 knapply

Thanks for this I'll definitely give it a try, there are multiple parts of the json file required but not 100% so should be some performance gains to be had. I think I'm approaching diminishing returns as the parsing of the results is now much faster than waiting for otp to produce the result

mem48 avatar Aug 14 '20 22:08 mem48

@knapply I tried your idea

> bench::mark(r1 = RcppSimdJson::fparse(opentripplanner::json_example_transit),
+             r2 = RcppSimdJson::fparse(opentripplanner::json_example_transit, query = "plan/itineraries"), check = FALSE)
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory       time    gc      
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>       <list>  <list>  
1 r1          762us   824us      920.    14.2KB     2.79   330     1      359ms <NULL> <Rprofmem[,~ <bch:t~ <tibble~
2 r2          745us   817us      674.    14.2KB     2.04   330     1      490ms <NULL> <Rprofmem[,~ <bch:t~ <tibble~

As I thought only a very small performance boost, as the "plan/itineraries" section I need is 90% of the whole JSON file. But it will also allow me to remove a few lines of R code so a boost worth having.

Are their plans for a write json function? At the moment I have to keep rjson as a package dependency so I can write and read json. It would be nice to reduce the dependencies if possible.

mem48 avatar Aug 15 '20 09:08 mem48

Well, I'm glad it may be somewhat helpful!

Are their plans for a write json function?

The simdjson library itself doesn't yet support serialization, but it is on their "post 1.0" agenda: https://github.com/simdjson/simdjson/issues/821.

I wouldn't expect it anytime soon, but I'm sure we'll add support for it when they get around to it upstream.

knapply avatar Aug 16 '20 20:08 knapply

It's been a while, but I finally sort of figured out how to adjust the Java version that the package uses ... I still haven't figured out how to incorporate it into the package's code directly (and I've only tested this with success on a Mac), but if you find the path to your Java 8 version by running :

/usr/libexec/java_home -V

in the Terminal, you can then set your options in R studio with the following:

Sys.setenv(JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_251.jdk/Contents/Home")

As long as you do that before loading the library, opentripplanner will pick up the right version. I'm not quite sure how to incorporate that into the flow of the package since you need to change the option before loading. Maybe it can be reloaded?

jamesdeweese avatar Oct 24 '20 15:10 jamesdeweese

  • [x] Surface API http://docs.opentripplanner.org/en/v1.5.0/Surface/

mem48 avatar Dec 01 '20 13:12 mem48

@jamesdeweese I'm not sure what progress you have made on this effort, but I spent the afternoon trying to get R to recognize an older version of Java without uninstalling the newer versions I had. It was quite an adventure, with a simple solution: just set the JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home" in the .Renviron file for the project.

More details here.

gregmacfarlane avatar Jun 04 '21 23:06 gregmacfarlane

I know that I'm late to this discussion @gregmacfarlane @jamesdeweese. But this package is being so useful for me that I wanted to contribute with my grain of sand.

You can select the the Java version for Windows interactively for the current R session with Sys.setenv()

So I added the following code to my scripts

#check the current Java version
otp_check_java(1.5)
system2("java", "-version")

#get the full PATH environment
pth <- Sys.getenv("PATH")

# Change the current java version to the desired version in the pth string. 
# Here you need the folder name of the desired version. 
pth2 <- gsub(pattern = "jdk-11.0.14",
             replacement = "jre1.8.0_361",
             x = pth)

#set the new PATH environment with the new string
Sys.setenv("PATH"=pth2)

I know it is pretty basic, I'm not a developer but this is working for my when testing different versions of OTP and other packages requiring sepcific JAVA versions

temporalista avatar Feb 02 '23 21:02 temporalista

It is a good suggestion, the problem I've had is getting it to work reliably on different OSes and setups. CRAN has strict policies on R packages working well on all computers.

mem48 avatar Feb 03 '23 12:02 mem48

True. Maybe it is better to include it on instructions/tutorials.

temporalista avatar Feb 06 '23 19:02 temporalista