ospfwatcher
ospfwatcher copied to clipboard
History of all changes in OSPF Topology
OSPF Topology Watcher
OSPF Watcher is a monitoring tool of OSPF topology changes for network engineers. It works via passively listening to OSPF control plane messages through a specially established OSPF adjacency between OSPF Watcher and one of the network device. The tool logs OSPF events and/or export by Logstash to Elastic Stack (ELK), Zabbix, WebHooks and Topolograph monitoring dashboard for keeping the history of events, alerting, instant notification. Components of the solution are wrapped into containers, so it can be increadebly fast to start it. The only thing is needed to configure manually - is GRE tunnel setup on the Linux host.
Logged topology changes:
- OSPF neighbor adjacency Up/Down
- OSPF link cost changes
- OSPF networks appeared/disappeared from the topology
Architecture
The Quagga container has network_mode=host
so it sees the GRE tunnel, which is configured by Admin on the Linux Host.
Integration with Zabbix was added in ospfwatcher:v1.4 for allerting/notification of OSPF topology changes.
Note
ospfwatcher:v1.1 is compatible with topolograph:v2.7, it means that OSPF network changes can be shown on the network graph.
Functional Role
Demo
Click on the image in order zoom it.
Discovering OSPF logs in Kibana. Examples
OSPF cost changes on links
Logs if OSPF adjacency was Up/Down or any networks appeared/disappeared.
Topolograph OSPF Monitoring. New subnet event shows where the subnet appeared
Topolograph OSPF Monitoring. Filter any subnet-related events, select Change metric event
new and old metric is shown
Topolograph OSPF Monitoring. up/down link events
Red timelines show link (~adjacency) down events, green one - up link (~adjacency).
Timeline 10.1.1.2-10.1.1.3
has been selected.
OSPF topology change notification/alarming via Zabbix. Examples
Zabbix's dashboard with active OSPF alarms detected by OSPFWatcher
Zabbix OSPF neighbor up/down alarm
This alarm tracks all new OSPF adjacencies or when device loses its OSPF neighbor
Zabbix OSPF Cost changed on transit links
Transit links are all links between active OSPF neighbors. If cost on a link was changed it might affect all actual/shortest paths traffic follows
Zabbix alert if OSPF network was stopped announcing from node
If a subnet was removed from OSPF node (the node withdrew it from the announcement) it means the network from this node became unavailable for others, this event will be logged too.
Slack notification
HTTP POST messages can be easily accepted by messengers, which allows to get instant notifications of OSPF topology changes:
How to setup
- Choose a Linux host with Docker installed
- Setup Topolograph:
- launch your own Topolograph on docker using topolograph-docker or make sure you have a connection to the public https://topolograph.com
- create a user for API authentication using Local Registration form on the site, add your IP address in
API/Authorised source IP ranges
on the site and write down the following variables
Note
[email protected]
user withospf
password is used in.env
file. Create such user in case of using Docker version to use default.venv
variables and go to the next step. Write down the following variables in case of using public Topolograph:
TOPOLOGRAPH_HOST
TOPOLOGRAPH_PORT
TOPOLOGRAPH_USER_LOGIN
TOPOLOGRAPH_USER_PASS
- Setup ELK
- if you already have ELK instance running, so just remember
ELASTIC_IP
for filling env file later. Currently additional manual configuration is needed for creation Index Templates, because the demo script doesn't accept the certificate of ELK. It's needed to have one in case of security setting enabled. Required mapping for the Index Template is inospfwatcher/logstash/index_template/create.py
. Fill free to edit such a script for your needs. - if not - boot up a new ELK from docker-elk compose. For demo purporse set license of ELK as basic and turn off security. The setting are in docker-elk/elasticsearch/config/elasticsearch.yml
xpack.license.self_generated.type: basic
xpack.security.enabled: false
- Setup GRE tunnel from the host to a network device
It's needed to have minimum one GRE tunnel to an area, which is needed to be monitored. If OSPF domain has multiple areas, setup one GRE into each area. It's a restriction of OSPF architecture to knows about new/old adjancency or link cost changes via LSA1/LSA2 per area basis only. So Quagga host in OSPF Watcher should know about all subnets in all areas (which we want to monitor) and in order to isolate subnets from each other apply the policy to reject OSPF routers from installing them into the host's routing table. An example of such a policy is below:
# quagga/config/ospfd.conf
route-map TO_KERNEL deny 200
exit
!
ip protocol ospf route-map TO_KERNEL
Note
You can skip this step and run ospfwatcher intest_mode
, so test LSDB from the file will be taken and test changes (loss of adjancency and change of OSPF metric) will be posted in ELK
sudo modprobe ip_gre
sudo ip tunnel add tun0 mode gre remote <router-ip> local <host-ip> dev eth0 ttl 255
sudo ip address add <GRE tunnel ip address> dev tun0
sudo ip link set tun0 up
- Setup GRE tunnel from the network device to the host. An example for Cisco
Note
You can skip this step and run ospfwatcher intest_mode
, so test LSDB from the file will be taken and test changes (loss of adjancency and change of OSPF metric) will be posted in ELK
interface gigabitether0/1
ip address <GRE tunnel ip address>
tunnel mode gre
tunnel source <router-ip>
tunnel destination <host-ip>
ip ospf network type point-to-point
Set GRE tunnel network where <GRE tunnel ip address> is placed to quagga/config/ospfd.conf
How to start
git clone https://github.com/Vadims06/ospfwatcher.git
cd ospfwatcher
Set variables in .env
file:
- ELASTIC_IP=192.168.0.10 - set the IP address of your host, where the docker is hosted (if you run all demo on a single machine), do not put
localhost
, because ELK, Topolograph and OSPF Watcher run in their private network space - TOPOLOGRAPH_HOST=192.168.0.10 - same logic here
- TEST_MODE='True' - if mode is
test
, a demo LSDB from the file will be taken, not from Quagga
Default values for your information:
- ELASTIC_PORT=9200
- ELASTIC_USER_LOGIN=elastic
- ELASTIC_USER_PASS=changeme
- TOPOLOGRAPH_PORT=8080
- [email protected]
- TOPOLOGRAPH_WEB_API_PASSWORD=ospf
Start docker-compose
docker-compose build
docker-compose up -d
Kibana settings
-
Index Templates have already been created. It's needed to check that logs are received by ELK via
Stack Management/ Kibana/ Stack Management/ Index Management
.watcher-costs-changes
andwatcher-updown-events
should be in a list.
- Create Index Pattern for old ELK
Stack Management/ Kibana/ Stack Management/ Index Pattern
->Create index pattern
or Data View in new ELKStack Management/ Kibana/ Stack Management/ Data Views
and specifywatcher-updown-events
as Index pattern name -> Next -> choosewatcher_time
as timestamp.
Repeat the step for creationwatcher-costs-changes
Because the connection between Watcher (with Logstash) can be lost, but watcher continues to log all topology changes with the correct time. When the connection is repaired, all logs will be added to ELK and you can check the time of the incident. If you choose@timestamp
- the time of all logs will be the time of their addition to ELK.
Browse your topology changes logs
Your logs are here http://localhost:5601/ -> Analytics/Discover
watcher-updown-events
.
Zabbix settings
Zabbix settings are available here /docs/zabbix-ui
. There are 4 hosts and items (host and item inside each host has the same names) are required:
- ospf_neighbor_up_down
- ospf_network_up_down
- ospf_link_cost_change
- ospf_stub_network_cost_change
WebHook setting
- Create a Slack app
- Enable Incoming Webhooks
- Create an Incoming Webhook (generates URL)
- Uncomment
EXPORT_TO_WEBHOOK_URL_BOOL
in.env
, set the URL toWEBHOOK_URL
Troubleshooting
This is a quick set of checks in case of absence of events on OSPF Monitoring page. OSPF Watcher consists of three services: OSPFd/Quagga [1] -> Watcher [2] -> Logstash [3] -> Topolograph & ELK & Zabbix & WebHooks.
- Check if Quagga tracks OSPF changes, run the following command:
docker exec -it quagga cat /var/log/quagga/ospfd.log
you should see logs similar to this
2. Check if Watcher parses changes:
docker exec -it watcher cat /home/watcher/watcher/logs/watcher.log
You should see tracked changes of your network, i.e. here we see that 10.0.0.0/29
network went up at 2023-10-27T07:50:24Z
on 10.10.1.4
router.
2023-10-27T07:50:24Z,demo-watcher,network,10.0.0.0/29,up,10.10.1.4,28Oct2023_01h10m02s_7_hosts_ospfwatcher
- Check that messages are sent:
- Uncomment
DEBUG_BOOL="True"
in.env
and check logsdocker logs logstash
and do:- wait for the next event in your network
- change a cost of you stub network, return it back and see this event in this logs
- simulate network changes
docker exec -it watcher /bin/bash echo "2023-10-27T07:50:24Z,demo-watcher,network,10.0.0.0/29,up,10.10.1.4,28Oct2023_01h10m02s_7_hosts_ospfwatcher" >> /home/watcher/watcher/logs/watcher.log
- Connect to mongoDB and check logs:
Inside container (change):docker exec -it mongo /bin/bash
Check the last two/N records in adjancency changes (mongo mongodb://$MONGO_INITDB_ROOT_USERNAME:$MONGO_INITDB_ROOT_PASSWORD@mongodb:27017/admin?gssapiServiceName=mongodb use admins
adj_change
) or cost changes (cost_change
)db.adj_change.find({}).sort({_id: -1}).limit(2) db.cost_change.find({}).sort({_id: -1}).limit(2)
Note
If you see a single event indocker logs logstash
it means that mongoDB output is blocked, check if you have a connection to MongoDBdocker exec -it logstash curl -v mongodb:27017
- Uncomment
Logstash troubleshooting
Start logstash container
[ospf-watcher]# docker run -it --rm --network=topolograph_backend --env-file=./.env -v ./logstash/pipeline:/usr/share/logstash/pipeline -v ./logstash/config:/usr/share/logstash/config logstash:7.17.0 /bin/bash
Inside container run this command:
bin/logstash -e 'input { stdin { } } filter { dissect { mapping => { "message" => "%{watcher_time},%{watcher_name},%{event_name},%{event_object},%{event_status},old_cost:%{old_cost},new_cost:%{new_cost},%{event_detected_by},%{subnet_type},%{shared_subnet_remote_neighbors_ids},%{graph_time}" }} mutate { update => { "[@metadata][mongo_collection_name]" => "adj_change" }} } output { stdout { codec => rubydebug {metadata => true}} }'
It will expect an input from CLI, so copy and past this line of log
2023-01-01T00:00:00Z,demo-watcher,metric,10.1.14.0/24,changed,old_cost:10,new_cost:123,10.1.1.4,stub,10.1.1.4,01Jan2023_00h00m00s_7_hosts
The output should be:
[INFO ] 2024-05-13 21:15:25.462 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2024-05-13 21:15:25.477 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
2023-01-01T00:00:00Z,demo-watcher,metric,10.1.14.0/24,changed,old_cost:10,new_cost:123,10.1.1.4,stub,10.1.1.4,01Jan2023_00h00m00s_7_hosts
{
"graph_time" => "01Jan2023_00h00m00s_7_hosts",
"event_detected_by" => "10.1.1.4",
"subnet_type" => "stub",
"message" => "2023-01-01T00:00:00Z,demo-watcher,metric,10.1.14.0/24,changed,old_cost:10,new_cost:123,10.1.1.4,stub,10.1.1.4,01Jan2023_00h00m00s_7_hosts",
"watcher_name" => "demo-watcher",
"watcher_time" => "2023-01-01T00:00:00Z",
"@timestamp" => 2024-05-13T21:15:50.628Z,
"old_cost" => "10",
"@version" => "1",
"host" => "ba8ff3ab31f8",
"event_name" => "metric",
"new_cost" => "123",
"shared_subnet_remote_neighbors_ids" => "10.1.1.4",
"event_object" => "10.1.14.0/24",
"event_status" => "changed"
}
Minimum Logstash version
7.17.0, this version includes bug fix of issues_281, issues_5115
License
The functionality was tested using Basic ELK license.