FiloDB
FiloDB copied to clipboard
Upgrade to Scala 2.12
Branch, version, commit
OS and Environment
JVM version
Scala version
Kafka and Cassandra versions and setup
Spark version if used
Deployed mode (client/cluster on Spark Standalone/YARN/Mesos/EMR or default)
Actual (wrong) behavior
Steps to reproduce
Logs
some log
or as attached file (see below)
Unused parts of this template should be removed (including this line).
Do you expect any problems around the upgrade?
Do you expect cross-building?
At least the following dependencies need to be updated:
- Scalatest
2.2.6
->> 3.0
- Scalacheck
1.11
->> 1.12.4
- scalaxy-loops -- seems like it has never been updated to 2.12. Recomemded to use
scalaxy-streams
ofcfor
fromspire
. Any suggestions? (https://github.com/deeplearning4j/nd4s/issues/108)
Hi there, we don't expect any issues with upgrading, but of course will not know until we build it.... thanks.
On Wed, Jul 29, 2020 at 6:50 AM Szymon Matejczyk [email protected] wrote:
At least the following dependencies need to be updated:
- Scalatest 2.2.6 -> > 3.0
- Scalacheck 1.11 -> > 1.12.4
- scalaxy-loops -- seems like it has never been updated to 2.12. Recomemded to use scalaxy-streams of cfor from spire. Any suggestions? (deeplearning4j/nd4s#108 https://github.com/deeplearning4j/nd4s/issues/108)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/457#issuecomment-665676090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDPWZDFMAYJYXDR5BXSHDR6ASJHANCNFSM4IKMUDBA .
-- If you are free, you need to free somebody else. If you have some power, then your job is to empower somebody else. --- Toni Morrison
Now is the time to understand more, so that we can fear less. --Marie Curie
Started doing this.
Do you have any thoughts on scalaxy-loops
? I should probably start with comparing scalaxy loops with Scala 2.12 compiler optimisations. Will start with BasicFiloBenchmark
if you consider this one a good choice for pure loops benchmarking.
I believe we could just try scalaxy-streams, which is supposed to be the successor, but I haven't checked to make sure it has the same macros. Yes, BasicFiloBenchmark is a great one to start with.
Thanks for the upgrade work!
On Thu, Jul 30, 2020 at 1:15 PM Szymon Matejczyk [email protected] wrote:
Do you have any thoughts on scalaxy-loops? I should probably start with comparing scalaxy loops with Scala 2.12 compiler optimisations. Will start with BasicFiloBenchmark if you consider this one a good choice for pure loops benchmarking.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/457#issuecomment-666664136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDPW5Z5BY4PXGF6RRBZQDR6HIGTANCNFSM4IKMUDBA .
-- If you are free, you need to free somebody else. If you have some power, then your job is to empower somebody else. --- Toni Morrison
Now is the time to understand more, so that we can fear less. --Marie Curie
Turns out scalaxy-streams neither support Scala 2.12...
This comment suggests either rewriting range loops to while
loops or writing own macros.
https://github.com/nativelibs4java/scalaxy-streams/issues/12#issuecomment-441822498
Could also use spire
cfor
macro, but that's still a lot of manual and possibly error prone rewrites.
Some more context about optimising range loops: https://github.com/scala/bug/issues/1338#issuecomment-506662928
The difference between Scala 2.11 + scalaxy loops and Scala 2.12 without scalaxy loops: [running on my laptop: MacBookPro16,1 x86_64 2400 MHz, 16 cores, 64G, Darwin 19.5.0]
Scala 2.11 + scalaxy
jmh:run -i 10 -wi 5 -f1 -jvmArgsAppend -XX:MaxInlineLevel=20 -jvmArgsAppend -Xmx4g -jvmArgsAppend -XX:MaxInlineSize=99 filodb.jmh.BasicFiloBenchmark
[info] Benchmark Mode Cnt Score Error Units
[info] BasicFiloBenchmark.sumAllIntsSumMethod avgt 10 1.816 ± 0.149 us/op
[info] BasicFiloBenchmark.sumAllLongsApply avgt 10 2.251 ± 0.088 us/op
[info] BasicFiloBenchmark.sumAllLongsIterate avgt 10 0.907 ± 0.059 us/op
[info] BasicFiloBenchmark.sumAllLongsSumMethod avgt 10 1.173 ± 0.049 us/op
[info] BasicFiloBenchmark.sumDoublesSumMethod avgt 10 2.312 ± 0.121 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesApply avgt 10 9.103 ± 0.539 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesIterate avgt 10 3.004 ± 0.189 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesSum avgt 10 0.946 ± 0.076 us/op
Scala 2.12 + no scalaxy optimisations
jmh:run -i 10 -wi 5 -f1 -jvmArgsAppend -XX:MaxInlineLevel=20 -jvmArgsAppend -Xmx4g -jvmArgsAppend -XX:MaxInlineSize=99 filodb.jmh.BasicFiloBenchmark
[info] Benchmark Mode Cnt Score Error Units
[info] BasicFiloBenchmark.sumAllIntsSumMethod avgt 10 1.727 ± 0.077 us/op
[info] BasicFiloBenchmark.sumAllLongsApply avgt 10 2.634 ± 0.069 us/op
[info] BasicFiloBenchmark.sumAllLongsIterate avgt 10 1.657 ± 0.028 us/op
[info] BasicFiloBenchmark.sumAllLongsSumMethod avgt 10 1.097 ± 0.025 us/op
[info] BasicFiloBenchmark.sumDoublesSumMethod avgt 10 2.362 ± 0.020 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesApply avgt 10 8.235 ± 0.127 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesIterate avgt 10 2.872 ± 0.089 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesSum avgt 10 1.214 ± 0.033 us/op
We are on par for most of them except:
BasicFiloBenchmark.sumAllLongsIterate
is taking 70% longer in 2.12.
BasicFiloBenchmark.sumTimeSeriesBytesSum
is taking 30% longer in 2.12.
Do you run jmh benchmarks continuously and can confirm this results for all benchmarks?
More benchmarks
Scala 2.12 + no scalaxy + "-opt:l:inline", "-opt-inline-from:filodb.**", "-opt-warnings"
jmh:run -i 10 -wi 5 -f1 -jvmArgsAppend -XX:MaxInlineLevel=20 -jvmArgsAppend -Xmx4g -jvmArgsAppend -XX:MaxInlineSize=99 filodb.jmh.BasicFiloBenchmark
[info] Benchmark Mode Cnt Score Error Units
[info] BasicFiloBenchmark.sumAllIntsSumMethod avgt 10 1.680 ± 0.031 us/op
[info] BasicFiloBenchmark.sumAllLongsApply avgt 10 2.646 ± 0.082 us/op
[info] BasicFiloBenchmark.sumAllLongsIterate avgt 10 1.689 ± 0.028 us/op
[info] BasicFiloBenchmark.sumAllLongsSumMethod avgt 10 1.133 ± 0.052 us/op
[info] BasicFiloBenchmark.sumDoublesSumMethod avgt 10 2.393 ± 0.062 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesApply avgt 10 8.805 ± 0.388 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesIterate avgt 10 2.984 ± 0.078 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesSum avgt 10 1.242 ± 0.085 us/op
Scala 2.12 + no scalaxy + "-opt:l:inline", "-opt-inline-from:filodb.**", "-opt-warnings"
jmh:run -i 10 -wi 5 -f1 -jvmArgsAppend -XX:MaxInlineLevel=20 -jvmArgsAppend -Xmx4g -jvmArgsAppend -XX:MaxInlineSize=99 filodb.jmh.BasicFiloBenchmark
Using spire.cforRange for benchmark code.
[info] Benchmark Mode Cnt Score Error Units
[info] BasicFiloBenchmark.sumAllIntsSumMethod avgt 10 1.747 ± 0.121 us/op
[info] BasicFiloBenchmark.sumAllLongsApply avgt 10 2.215 ± 0.079 us/op
[info] BasicFiloBenchmark.sumAllLongsIterate avgt 10 0.848 ± 0.025 us/op
[info] BasicFiloBenchmark.sumAllLongsSumMethod avgt 10 1.128 ± 0.064 us/op
[info] BasicFiloBenchmark.sumDoublesSumMethod avgt 10 2.448 ± 0.102 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesApply avgt 10 8.877 ± 0.232 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesIterate avgt 10 2.942 ± 0.076 us/op
[info] BasicFiloBenchmark.sumTimeSeriesBytesSum avgt 10 0.862 ± 0.016 us/op
I'm biasing towards:
- Drop scalaxy dependency altogether.
- Use
cforRange
fromspire
that is the most similar to what we have now. The code could be adapted by some regexes.
WDYT?
Thanks, I'm personally fine with cforRange if it's mostly the same thing.
Just curious btw about your use case for 2.12 and FiloDB. We can take that conversation offline though.
On Fri, Jul 31, 2020 at 8:27 AM Szymon Matejczyk [email protected] wrote:
I'm biasing towards:
- Drop scalaxy dependency altogether.
- Use cforRange from spire that is the most similar to what we have now.
WDYT?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/457#issuecomment-667180349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDPW2T6M4LSDQBPJVQVODR6LPFLANCNFSM4IKMUDBA .
-- If you are free, you need to free somebody else. If you have some power, then your job is to empower somebody else. --- Toni Morrison
Now is the time to understand more, so that we can fear less. --Marie Curie
I like your in memory implementation (memory
project) and would like to use it in some of 2.12 benchmarks.
Besides, I wanted to get to know the project a bit better.
Seems like the last blocker is quantifind.sumac
that is not released for 2.12...
https://github.com/quantifind/Sumac/issues/56
The library seems to be not maintained. Do you have any suggestions on what to migrate to?
So, we don’t control the sumac dependency….. not sure what we can do here, we might have to switch to something else.
On Aug 4, 2020, at 12:29 PM, Szymon Matejczyk [email protected] wrote:
Seems like the last blocker is quantifind.sumac that is not released for 2.12... quantifind/Sumac#56 https://github.com/quantifind/Sumac/issues/56 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/457#issuecomment-668782582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDPW2HSS6MB3ABWYUWB4TR7BORHANCNFSM4IKMUDBA.
Looks like there are two possibilities: scopt and scallop. The former looks more explicit, but needs more boilerplate. The later is more similar to sumac, because it can infer parameter names from configuration class file names. I will go with the latter to keep the changes minimal unless you have strong opinions.
I think we already use one of them so I’d stick with whichever other one we use.
On Aug 6, 2020, at 12:05 AM, Szymon Matejczyk [email protected] wrote:
Looks like there are two possibilities: scopt and scallop. The former looks more explicit, but needs more boilerplate. The later is more similar to sumac, because it can infer parameter names from configuration class file names. I will go with the latter to keep the changes minimal unless you have strong opinions.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/457#issuecomment-669747193, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDPW4FPGVZDJ7DXJBQZK3R7JI2XANCNFSM4IKMUDBA.