flowvisor
flowvisor copied to clipboard
FlowVisor crash: java.lang.ArrayIndexOutOfBoundsException
The production FlowVisor at GENI regional network provider Southern Crossroads (SoX) crashed recently. In syslog:
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR slicer_cfece52e-c0ec-45e6-9781-54b0c385eab4_dpid=0f:fa:34:40:b5:03:14:00 : STARVING: handling event took 4130ms: org.flowvisor.events.FVIOEvent@42ef4994
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: WARN slicer_1c4c697d-44e4-493f-9f09-c1c96b3d0487_dpid=0f:fa:34:40:b5:03:14:00 : Verifying Slice is not over its flow rule limit
Jul 7 01:05:47 1>Jul last message repeated 16 times
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : MAIN THREAD DIED!!!
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : ----------------------------
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : ----- exception in thread Thread[main,5,main]::java.lang.ArrayIndexOutOfBoundsException: 57
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.protocol.OFType.valueOf(OFType.java:185)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.protocol.OFMessage.readFrom(OFMessage.java:130)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.protocol.factory.BasicFactory.parseMessages(BasicFactory.java:49)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.io.OFMessageAsyncStream.read(OFMessageAsyncStream.java:54)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.io.FVMessageAsyncStream.read(FVMessageAsyncStream.java:86)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.io.OFMessageAsyncStream.read(OFMessageAsyncStream.java:44)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.classifier.FVClassifier.handleIOEvent(FVClassifier.java:460)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.classifier.FVClassifier.handleEvent(FVClassifier.java:379)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.events.FVEventLoop.doEventLoop(FVEventLoop.java:168)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.FlowVisor.run(FlowVisor.java:196)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.FlowVisor.main(FlowVisor.java:229)
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : ----------------------------
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : restarting after main thread died
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: WARN classifier-dpid=0f:fa:34:40:b5:03:14:00 : tearing down
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : initializing poll loop
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : initializing FlowVisor UserAPI JSONRPC SSL WebServer on port 8081
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO OFSwitchAcceptor : Listenning on port 6633
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : initializing FlowVisor UserAPI XMLRPC SSL WebServer on port 8080
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: FATAL none : failed to spawn APIServer
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : Shutting down config database.
Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : Shutting down config database.
They restarted it, and it came back up fine.
What other info can we gather that might help?
Would it be possible to find the actual message that generated this crash?
The crash occurred in openflowj, which is the library responsible for marshalling/unmarshalling openflow messages. It crashed while trying to parse a message with type value 57 which is strange because openflow (at least standard of) does not have a message with that type value.
All that said, openflowj should not crash when such a message occurs. I'll escalate this to openflowj's maintainer.
On Jul 8, 2013, at 10:37 AM, Josh Smift [email protected] wrote:
The production FlowVisor at GENI regional network provider Southern Crossroads (SoX) crashed recently. In syslog:
Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR slicer_cfece52e-c0ec-45e6-9781-54b0c385eab4_dpid=0f:fa:34:40:b5:03:14:00 : STARVING: handling event took 4130ms: org.flowvisor.events.FVIOEvent@42ef4994 Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: WARN slicer_1c4c697d-44e4-493f-9f09-c1c96b3d0487_dpid=0f:fa:34:40:b5:03:14:00 : Verifying Slice is not over its flow rule limit Jul 7 01:05:47 1>Jul last message repeated 16 times Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : MAIN THREAD DIED!!! Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : ---------------------------- Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : ----- exception in thread Thread[main,5,main]::java.lang.ArrayIndexOutOfBoundsException: 57 Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.protocol.OFType.valueOf(OFType.java:185) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.protocol.OFMessage.readFrom(OFMessage.java:130) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.protocol.factory.BasicFactory.parseMessages(BasicFactory.java:49) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.io.OFMessageAsyncStream.read(OFMessageAsyncStream.java:54) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.io.FVMessageAsyncStream.read(FVMessageAsyncStream.java:86) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.openflow.io.OFMessageAsyncStream.read(OFMessageAsyncStream.java:44) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.classifier.FVClassifier.handleIOEvent(FVClassifier.java:460) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.classifier.FVClassifier.handleEvent(FVClassifier.java:379) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.events.FVEventLoop.doEventLoop(FVEventLoop.java:168) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.FlowVisor.run(FlowVisor.java:196) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : org.flowvisor.FlowVisor.main(FlowVisor.java:229) Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : ---------------------------- Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: ERROR none : restarting after main thread died Jul 7 01:05:47 foam 1>Jul 7 01:05:47 flowvisor: WARN classifier-dpid=0f:fa:34:40:b5:03:14:00 : tearing down Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : initializing poll loop Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : initializing FlowVisor UserAPI JSONRPC SSL WebServer on port 8081 Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO OFSwitchAcceptor : Listenning on port 6633 Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : initializing FlowVisor UserAPI XMLRPC SSL WebServer on port 8080 Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: FATAL none : failed to spawn APIServer Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : Shutting down config database. Jul 7 01:05:49 foam 1>Jul 7 01:05:49 flowvisor: INFO none : Shutting down config database.
They restarted it, and it came back up fine.
What other info can we gather that might help?
— Reply to this email directly or view it on GitHub.
AA> Would it be possible to find the actual message that generated this crash?
Mm, maybe. To clarify, by "actual message", you mean like what API call was made immediately before the crash?
Would that be in syslog? If not, where would we find it?
No I was hoping for the actual openflow message. That's probably a long shot but you never know ;)
Do you know what OF devices are connected to that FlowVisor?
On Jul 8, 2013, at 10:55 AM, Josh Smift [email protected] wrote:
AA> Would it be possible to find the actual message that generated this crash?
Mm, maybe. To clarify, by "actual message", you mean like what API call was made immediately before the crash?
Would that be in syslog? If not, where would we find it? — Reply to this email directly or view it on GitHub.
AA> No I was hoping for the actual openflow message. That's probably a long shot but you never know ;)
Heh. I don't think anyone would have that, unless SoX happened to be running a packet capture at that time. I'll ask just in case.
AA> Do you know what OF devices are connected to that FlowVisor?
Just one NEC PF5820 at the moment.
Type 57 may be the vendor message NEC uses for big stats replies. I'm not 100% sure, as I haven't been recording their message types, but they use a lot of custom message types as extensions to support their flowtable being larger than what a lot of OF messages can handle by default (various datatypes are not large enough, etc.).
Sure but the type should be OFPT_VENDOR and not 57 or something crazy.
On Jul 8, 2013, at 11:06 AM, Nick Bastin [email protected] wrote:
Type 57 may be the vendor message NEC uses for big stats replies. I'm not 100% sure, as I haven't been recording their message types, but they use a lot of custom message types as extensions to support their flowtable being larger than what a lot of OF messages can handle by default (various datatypes are not large enough, etc.).
— Reply to this email directly or view it on GitHub.
Sure, but even if the switch is doing something totally wrong with the type (which I don't know enough about this to say whether it is or not), FV shouldn't crash as a result.
Agreed, I am filing a report on OpenFlowJ. Would you happen to have a packet capture?
Sorry, I thought I'd mentioned this earlier: No, no one happened to be running a packet capture at the time of the crash.
Can you reproduce the bug? Can you get a package capture that way?
Can you do anything to ameliorate this on the FV side, so that FV doesn't die even when it encounters a bug in openflowj?
Sure, we shouldn't crash when ofj crashes although if a packet isn't parsed correctly for some reason it could cause incorrect fv behaviour for some pathological cases. I don't know what the correct behaviour is here. Need to think about that some more.
Ali Al-Shabibi (sent from handheld)
On 11 juil. 2013, at 18:07, Josh Smift [email protected] wrote:
Sorry, I thought I'd mentioned this earlier: No, no one happened to be running a packet capture at the time of the crash.
Can you reproduce the bug? Can you get a package capture that way?
Can you do anything to ameliorate this on the FV side, so that FV doesn't die even when it encounters a bug in openflowj?
— Reply to this email directly or view it on GitHub.
Drop the packet. You don't know how to slice it, and you're promising to slice the control plane. There's nothing you can do except drop it and log it.