grafcli icon indicating copy to clipboard operation
grafcli copied to clipboard

Backup/restore not working as expected

Open dictvm opened this issue 6 years ago • 7 comments

I have attempted to restore a backup of all dashboards of a Grafana installation using sqlite3 for its storage. Now I want to migrate to Postgres due to database locking issues. My plan was to fire up a new instance of Grafana, configure it to use Postgres and then restore the dashboards from the backup to the new instance. To make things easier I decided to just use the http api.

I'm using a grafcli.ini in the same directory where I'm executing grafcli and where the backup is stored. The crendentials are provided in the grafcli.ini-file.

However, grafcli complains about getting a 404 from the new Grafana instance:

grafcli restore backup.tgz remote/newinstance
Traceback (most recent call last):
  File "/usr/local/bin/grafcli", line 27, in <module>
    sys.exit(main())
  File "/usr/local/bin/grafcli", line 13, in main
    result = cli.execute(*sys.argv[1:])
  File "/usr/local/lib/python3.6/site-packages/climb/core.py", line 79, in execute
    return self._commands.execute(command, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/climb/commands.py", line 26, in execute
    return method(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/grafcli/commands.py", line 205, in restore
    self.file_import(file_path, doc_path)
  File "/usr/local/lib/python3.6/site-packages/grafcli/commands.py", line 237, in file_import
    self._resources.save(path, document)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/resources.py", line 51, in save
    return manager.save(document, *parts)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/common.py", line 46, in save
    origin_document = self.get(dashboard_name, row_name, panel_name)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/common.py", line 33, in get
    dashboard = self._storage.get(dashboard_name)
  File "/usr/local/lib/python3.6/site-packages/grafcli/storage/api.py", line 37, inget
    source = self._call('GET', 'dashboards/db/{}'.format(dashboard_id))
  File "/usr/local/lib/python3.6/site-packages/grafcli/storage/api.py", line 29, in_call
    response.raise_for_status()
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 937, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://newinstance.corp.tld/api/dashboards/db/dashboard_to_restore

Am I missing something or is this a bug? Let me know if you need me to provide additional information.

dictvm avatar Jul 26 '17 14:07 dictvm

I can reproduce this with the latest docker image of Grafana on localhost. The backup is working fine but restoring isn't possible.

dictvm avatar Jul 26 '17 16:07 dictvm

Hey. It looks like the restore tries to get a dashboard that does not exist. I'll look into it, but if you'd like a quick workaround, you could try creating dashboard with this name on the new instance.

m110 avatar Jul 27 '17 06:07 m110

Thanks. I've tried to create an empty dashboard with the same name of the dashboard from the backup, which grafcli recognized and prompted if it should overwrite its content:

Overwrite new-dashboard-copy? [y/n]:

Unfortunately, I'm still getting this error after approving this:

Traceback (most recent call last):
  File "/usr/local/bin/grafcli", line 27, in <module>
    sys.exit(main())
  File "/usr/local/bin/grafcli", line 13, in main
    result = cli.execute(*sys.argv[1:])
  File "/usr/local/lib/python3.6/site-packages/climb/core.py", line 79, in execute
    return self._commands.execute(command, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/climb/commands.py", line 26, in execute
    return method(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/grafcli/commands.py", line 205, in restore
    self.file_import(file_path, doc_path)
  File "/usr/local/lib/python3.6/site-packages/grafcli/commands.py", line 237, in file_import
    self._resources.save(path, document)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/resources.py", line 51, in save
    return manager.save(document, *parts)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/common.py", line 46, in save
    origin_document = self.get(dashboard_name, row_name, panel_name)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/common.py", line 33, in get
    dashboard = self._storage.get(dashboard_name)
  File "/usr/local/lib/python3.6/site-packages/grafcli/storage/api.py", line 37, in get
    source = self._call('GET', 'dashboards/db/{}'.format(dashboard_id))
  File "/usr/local/lib/python3.6/site-packages/grafcli/storage/api.py", line 29, in _call
    response.raise_for_status()
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 937, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:3000/api/dashboards/db/new-dashboard-copy

dictvm avatar Jul 27 '17 08:07 dictvm

So it seems this was due to silly lack of exception mapping in the API storage. Can you try it with the most recent version?

m110 avatar Jul 29 '17 11:07 m110

Looks good, I just tested it locally. I'll attempt to migrate our production Grafana today. I'm expecting that it'll work now. Thanks!

dictvm avatar Jul 31 '17 09:07 dictvm

My local tests succeeded, but in production I could not fully restore from the backup. After restoring 5 dashboards I'm hitting a 500:

grafcli restore backup.tgz remote/remote.tld                         ] 2:58 PM
Traceback (most recent call last):
  File "/usr/local/bin/grafcli", line 27, in <module>
    sys.exit(main())
  File "/usr/local/bin/grafcli", line 13, in main
    result = cli.execute(*sys.argv[1:])
  File "/usr/local/lib/python3.6/site-packages/climb/core.py", line 79, in execute
    return self._commands.execute(command, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/climb/commands.py", line 26, in execute
    return method(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/grafcli/commands.py", line 205, in restore
    self.file_import(file_path, doc_path)
  File "/usr/local/lib/python3.6/site-packages/grafcli/commands.py", line 237, in file_import
    self._resources.save(path, document)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/resources.py", line 51, in save
    return manager.save(document, *parts)
  File "/usr/local/lib/python3.6/site-packages/grafcli/resources/common.py", line 69, in save
    self._storage.save(dashboard.id, dashboard)
  File "/usr/local/lib/python3.6/site-packages/grafcli/storage/api.py", line 63, in save
    self._call('POST', 'dashboards/db', data)
  File "/usr/local/lib/python3.6/site-packages/grafcli/storage/api.py", line 30, in _call
    response.raise_for_status()
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 937, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://remote.tld/api/dashboards/db

I checked the Grafana pod's logs and it seems it's missing the data sources and that something's off with the alert of one of our dashboards:

t=2017-07-31T12:58:59+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/dashboards/db/dashboard_1 status=404 remote_addr=$OUR_OFFICE_IP time_ms=130 size=33 referer=
t=2017-07-31T12:59:00+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/dashboards/db/consul-cluster status=404 remote_addr=$OUR_OFFICE_IP time_ms=174 size=33 referer=
t=2017-07-31T12:59:01+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/dashboards/db/dashboard_3 status=404 remote_addr=$OUR_OFFICE_IP time_ms=78 size=33 referer=
t=2017-07-31T12:59:01+0000 lvl=eror msg="Invalid alert data. Cannot save dashboard" logger=context userId=1 orgId=1 uname=admin error="Data source not found"
t=2017-07-31T12:59:01+0000 lvl=eror msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=POST path=/api/dashboards/db status=500 remote_addr=$OUR_OFFICE_IP time_ms=114 size=55 referer=
t=2017-07-31T12:59:15+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/login/github status=302 remote_addr=$OUR_OFFICE_IP time_ms=11 size=299 referer="https://remote.tld/login?redirect=%2F"
t=2017-07-31T12:59:16+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/login/github status=302 remote_addr=$OUR_OFFICE_IP time_ms=1315 size=24 referer="https://remote.tld/login?redirect=%2F"
t=2017-07-31T12:59:56+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/dashboards/db/dashboard_4 status=404 remote_addr=$OUR_OFFICE_IP time_ms=64 size=33 referer=
t=2017-07-31T12:59:56+0000 lvl=eror msg="Invalid alert data. Cannot save dashboard" logger=context userId=1 orgId=1 uname=admin error="Data source not found"
t=2017-07-31T12:59:56+0000 lvl=eror msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=POST path=/api/dashboards/db status=500 remote_addr=$OUR_OFFICE_IP time_ms=113 size=55 referer=

According to the Grafana docs, exporting a dashboard should also include the data sources it depends on. Doesn't grafcli behave the same?

Thanks!

dictvm avatar Jul 31 '17 13:07 dictvm

Oh, right. This might be not so trivial then. It's connected to #8.

When grafcli was first created, grafana didn't have any export mechanism or API support yet, so all operations were based on direct SQL manipulation. That's why there are some differences now when working with the current API. I will definitely look into this soon.

Once again, if you're in a hurry, you could try creating manually the data source with the same name. I'm not sure about the alert-related error though.

m110 avatar Jul 31 '17 21:07 m110