grafana-wtf icon indicating copy to clipboard operation
grafana-wtf copied to clipboard

Finding unhealthy data sources

Open amotl opened this issue 3 years ago • 7 comments

Hi there,

within our conversation at [1], @chenlujjj asked for another feature:

Another feature request is to find out invalid data souces. By invalid I mean the data source cannot pass test when press “Save & Test” button in its page. I have written a tiny go script to do this. Maybe it can be added to grafana-wtf.

With kind regards, Andreas.

[1] https://community.grafana.com/t/how-to-find-out-unused-datasources/56920/7

amotl avatar Dec 10 '21 20:12 amotl

Hi @chenlujjj,

thank you for suggesting that feature. Sure, that can well become an additional functionality to grafana-wtf, gradually and progressively making it a more complete swiss-army knife tool. That is very much appreciated.

I will be happy to take a look at your go program if you can share it with me by any means? If you don't want to spend a dedicated repository on it, maybe upload it as a gist?

With kind regards, Andreas.

amotl avatar Dec 10 '21 20:12 amotl

Sure, I will upload the script when I go back to company next Monday.

chenlujjj avatar Dec 11 '21 05:12 chenlujjj

I would say better word for invalid is unhealthy here. Save & test button executes some simple query test, which depends on used datasource. That's IMHO not easy to implement here. It will need to support all current and future built-in & 3rd party datasource types. Also simple TCP connectivity can be a problem, because grafana-wtf can be running on different host as Grafana.

jangaraj avatar Dec 11 '21 08:12 jangaraj

Hi,

thanks for your guidance, Jan. On order to shed some more light on this topic, I wanted to reference [1] here:

Test your data source

testDatasource implements a health check for your data source. For example, Grafana calls this method whenever the user clicks the Save & Test button, after changing the connection settings.

async testDatasource()

In order to pick some arbitrary examples, [2-5] are the corresponding health check implementations for PostgreSQL, InfluxDB, Tempo, and Prometheus. We can clearly see that those implementations differ significantly, just as @jangaraj described it.

In order to bring this in as a sensible feature for grafana-wtf, it would surely need to support any kind of datasource, so I am sharing the same concerns as @jangaraj. Specifically because the healthcheck logic is apparently implemented in TypeScript, i.e. running on the browser, which probably only could be reused by means of browser automation, instead of being able to just call a HTTP healthcheck endpoint.

So, I will be excited to see how @chenlujjj might have solved it.

With kind regards, Andreas.

[1] https://grafana.com/tutorials/build-a-data-source-plugin/#test-your-data-source [2] https://github.com/grafana/grafana/blob/v8.3.2/public/app/plugins/datasource/postgres/datasource.ts#L178-L186 [3] https://github.com/grafana/grafana/blob/v8.3.2/public/app/plugins/datasource/influxdb/datasource.ts#L437-L487 [4] https://github.com/grafana/grafana/blob/v8.3.2/public/app/plugins/datasource/tempo/datasource.ts#L185-L196 [5] https://github.com/grafana/grafana/blob/v8.3.2/public/app/plugins/datasource/prometheus/datasource.ts#L791-L820

amotl avatar Dec 11 '21 14:12 amotl

Saying the above, the actual outcome from the testDatasource() routines seems to be relatively "simple". In the case of InfluxDB, clicking the Save & Test button just makes a HTTP request like

GET /api/datasources/proxy/1/query?db=ldi_v2&q=SHOW%20RETENTION%20POLICIES%20on%20%22ldi_v2%22&epoch=ms HTTP/1.1

amotl avatar Dec 11 '21 14:12 amotl

I have to admit that I haven't considered about non-prometheus type data sources.

The code to check if a prometheus data source is healthy or not (omit unrelated parts):

import "github.com/grafana-tools/sdk"

// grafana url and token
type Env struct {
	url, token string
}

// Client for grafana
type Client struct {
	env Env
	c   *sdk.Client
}

func (c *Client) validateDatasource(ds sdk.Datasource) error {
	client := http.Client{Timeout: 10 * time.Second}
	queryUrl := fmt.Sprintf("%s/api/datasources/proxy/%d/api/v1/query?query=%s", c.env.url, ds.ID, "1%2B1") // query=1+1
	if ds.Access == "direct" {
		queryUrl = fmt.Sprintf("%s/api/v1/query?query=%s", ds.URL, "1%2B1")
	}
	req, err := http.NewRequest("GET", queryUrl, nil)
	req.Header.Set("Authorization", "Bearer "+c.env.token)
	if err != nil {
		return err
	}
	resp, err := client.Do(req)
	if err != nil {
		return err
	}
	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("StatusCode is %d", resp.StatusCode)
	}
	defer resp.Body.Close()
	var response Response
	err = json.NewDecoder(resp.Body).Decode(&response)
	if err != nil {
		return err
	}
	return nil
}

We can notice that the HTTP request made by grafana when clicking the Save & Test button is related to the Access type of the data source.

Access mode controls how requests to the data source will be handled. Server should be the preferred way if nothing else is stated.

Server access mode (Default): All requests will be made from the browser to Grafana backend/server which in turn will forward the requests to the data source and by that circumvent possible Cross-Origin Resource Sharing (CORS) requirements. The URL needs to be accessible from the grafana backend/server if you select this access mode.

Browser access mode: All requests will be made from the browser directly to the data source and may be subject to Cross-Origin Resource Sharing (CORS) requirements. The URL needs to be accessible from the browser if you select this access mode.

So maybe the TCP connectivity mentioned by @jangaraj is not a problem any more, right ?

chenlujjj avatar Dec 13 '21 12:12 chenlujjj

Hi again,

we will be conceiving the foundation for this feature within grafana-client. I made a start with https://github.com/panodata/grafana-client/pull/19 and added an introduction / call for participation at https://github.com/panodata/grafana-client/issues/20, where the details of this matter can be discussed further.

As soon as the new feature will be ready over there, we will return here in order to use it within grafana-wtf appropriately. It will be a very sweet improvement.

With kind regards, Andreas.

amotl avatar Jun 20 '22 08:06 amotl