collins
collins copied to clipboard
spooky failure to start with "neither features.syslogAsset or multicollins.thisInstance were specified"
This is on 1.3.0. I had a pair (per DC) of collins instances running. I don't think they had been restarted in O(months). Today I tried to migrate those instances to new servers (copied conf, dumped db etc) and collins failed to start with [1]. The spooky outage inducing part came when I tried to restart the old servers and they failed with the the same error despite no changes to the config. I checked many times over that multicollins was indeed in the config (and that I got it to work later leads me to believe that as far as multicollins itself is concerned everything is/was fine).
The eventual workaround was to set:
features {
syslogAsset = "NameOfDC"
}
in production.conf
.
Related discussion in #117 Again as far as I can tell neither the code nor the config changed since the last restart.
I looked into the current state of master: https://github.com/tumblr/collins/blob/master/app/collins/util/Tattler.scala#L83
lazy val syslogAsset = Asset.findByTag(Feature.syslogAsset.getOrElse("tumblrtag1")).getOrElse {
throw new PlayException("", "neither features.syslogAsset or multicollins.thisInstance were specified")
}
and I'm afraid I don't understand what is going on their either. The only reference to multicollins seems to be in the exception.
[1]
PlayException: Confguration error [neither features.syslogAsset or multicollins.thisInstance were specified]
at util.config.ConfigAccessor$class.globalError(ConfigAccessor.scala:20)
at util.config.Feature$.globalError(Feature.scala:10)
at util.config.Feature$$anonfun$syslogAsset$2.apply(Feature.scala:35)
at util.config.Feature$$anonfun$syslogAsset$2.apply(Feature.scala:35)
at scala.Option.getOrElse(Option.scala:108)
at util.config.Feature$.syslogAsset(Feature.scala:34)
at util.config.Feature$.validateConfig(Feature.scala:61)
at util.config.Configurable$class.mergeReferenceAndSave(Configurable.scala:106)
at util.config.Feature$.mergeReferenceAndSave(Feature.scala:10)
at util.config.Configurable$class.initialize(Configurable.scala:43)
at util.config.Feature$.initialize(Feature.scala:10)
at util.config.Registry$$anonfun$validate$1.apply(Registry.scala:52)
at util.config.Registry$$anonfun$validate$1.apply(Registry.scala:52)
at scala.collection.Iterator$class.foreach(Iterator.scala:660)
at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:73)
at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:592)
at util.config.Registry$.validate(Registry.scala:52)
at collins.config.ConfigPlugin.onStart(ConfigPlugin.scala:14)
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:84)
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:84)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:45)
at play.api.Play$$anonfun$start$1.apply$mcV$sp(Play.scala:84)
at play.api.Play$$anonfun$start$1.apply(Play.scala:84)
at play.api.Play$$anonfun$start$1.apply(Play.scala:84)
at play.utils.Threads$.withContextClassLoader(Threads.scala:17)
at play.api.Play$.start(Play.scala:83)
at play.core.StaticApplication.<init>(ApplicationProvider.scala:51)
at play.core.server.NettyServer$.createServer(NettyServer.scala:136)
at play.core.server.NettyServer$$anonfun$main$5.apply(NettyServer.scala:165)
at play.core.server.NettyServer$$anonfun$main$5.apply(NettyServer.scala:164)
at scala.Option.map(Option.scala:133)
at play.core.server.NettyServer$.main(NettyServer.scala:164)
at play.core.server.NettyServer.main(NettyServer.scala)
@cburroughs I have definitely seen this behavior before, but I cant remember exactly what triggered it. Generally I set multicollins on and use that feature to identify each instance instead of syslog asset. Seems to avoid the issue?
multicollins {
enabled=true
thisInstance = "IATA01"
}
My multicolins config is
multicollins {
enabled = true
instanceAssetType = DATA_CENTER
locationAttribute = LOCATION
thisInstance = IAD
}
which is what worked until yesterday.
There were definitely initialization order issues in the 1.3.0 release. I can't say for certain that they have all been fixed in master, but I've addressed a few of those during the time. Unfortunately this isn't easy to reproduce and therefore fix.
@cburroughs - is the IAD DATA_CENTER asset still in your instance of collins? I think I've seen this error when the asset did not exist before. I would image it does exist, since you said you were running with that config until recently, just want to double check.
Also, this may help you debug, but this is where syslogAsset
gets populated to the value specified in the multicollins config:
https://github.com/tumblr/collins/blob/v1.3.0/app/util/config/Feature.scala#L32-L36
Yep, there is still an asset of type 'Data Center' with the tag IAD
.
@cburroughs were you able to figure this out? Maybe switching on and off multicollins might help fix the problem?
No I have not figured this out and I have not debugged further since features.syslogAsset
workaround put out the fire.
So I got this error:
Play server process ID is 7203
[info] play - database [collins] connected at jdbc:h2:mem:play
[error] application - Failed to create assetlog
play.api.PlayException: [neither features.syslogAsset or multicollins.thisInstance were specified]
After building collins with activator dist
, extracting the zip file in a new directory and running:
java -server -Dhttp.port=9023 -Dconfig.file=$(pwd)/conf/application.conf -DapplyEvolutions.collins=true -cp `pwd`/lib/\* play.core.server.NettyServer /tmp
Turns out, this is caused missing test/resources/profiles.yaml
. Maybe this issue at hand is similar?
Either way, this behavior is problematic.. Such runtime dependencies should be included in the dist zip file, it shouldn't be in test/
and ultimately the error is super misleading. I'm happy to fill some issues but not sure where to start..
Oh WTF. Now it's again not working on another system.. So well, dunno. Will update if I find out more.
Okay, this was related to a different jvm version. But if this also triggers this error, there is something seriously wrong with the error handling.
I think we have seen this before, where a syntax error in the production.conf
leads to very confusing error reporting due to a spaghettimess of try/catches around parsing configs. @discordianfish
I would check your production.conf very carefully and see if you can find the syntax error.
@byxorna I used the config from the repo without any changes. My core issue was that file missing from my working directory.
Oh what a PITA. Is it possible that right now it always returns the error not matter what is wrong during initialization? I got that error because of that missing file, because of something triggered by using java 1.8 instead of 1.7 and now because of something related to the database after changing h2:mem:play to h2:/var/lib/collins/database.h2..
Is there a more stable version? Can't get 1.3.0 running either and that's 2 years old already..
..and for solr misconfiguration as well as permission problem.
Okay, finally got this up and running but I'm wondering: Are there plans to fix the error handling? I would look into it but scala is far outside my comfort zone. I'm a bit worried of introducing it here with this issue pending and I don't know how active collins is still developed.