metrictank icon indicating copy to clipboard operation
metrictank copied to clipboard

dont hold write lock for entire duration of index delete calls

Open woodsaj opened this issue 4 years ago • 1 comments

When an index Delete is called, the entire Delete operation is performed while a write lock is held. While the lock is held, no metrics can be ingested and no other index searches can run.

To avoid problems , the Delete function https://github.com/grafana/metrictank/blob/master/idx/memory/memory.go#L1492 should instead

  1. use a read lock and perform a find() to get all matching nodes, then release the read lock.
  2. for each matching node, acquire a read lock, then traverse the tree to find all matching leaf nodes, then release the read lock.
  3. for each matching leaf node, acquire a write lock, then delete the leaf node, which will cascade down the tree and clean up any branches that have no children.

eg something like

	var deletedDefs []idx.Archive
	pre := time.Now()
	defer func() {
		if len(deletedDefs) == 0 {
			return
		}
		if m.findCache == nil {
			return
		}
		// asynchronously invalidate any findCache entries
		// that match any of the deleted series.
		if len(deletedDefs) > findCacheInvalidateQueueSize {
			m.findCache.Purge(orgId)
		} else {
			for _, d := range deletedDefs {
				m.findCache.InvalidateFor(orgId, d.NameWithTags())
			}
		}
	}()
	m.RLock()
	tree, ok := m.tree[orgId]
	if !ok {
                m.RUnlock()
		return nil, nil
	}

	found, err := find(tree, pattern)
        m.RUnlock()
	if err != nil {
		return nil, err
	}
        leafNodes := make([]*Node, 0)
	for _, node := range found {
                m.RLock()
                // GetLeaves would be some recursive function like https://github.com/grafana/metrictank/blob/master/idx/memory/memory.go#L1537-L1552
		leafNodes = append(leafNodes, node.GetLeaves()) 
                m.RUnlock()
	}
        for _, node := range leafNodes {
                m.Lock() // should probably use the same lock rate limiting that is used in Prune
                deleted := m.delete(orgId, node, true, false)
		deletedDefs = append(deletedDefs, deleted...)
                m.Unlock()
        }
	statMetricsActive.DecUint32(uint32(len(deletedDefs)))
	statDeleteDuration.Value(time.Since(pre))

	return deletedDefs, nil
}

woodsaj avatar Aug 21 '20 11:08 woodsaj

https://github.com/grafana/metrictank/pull/1897 might fix it, if not @robert-milan has another branch that should and will do a PR for

fkaleo avatar Oct 28 '20 16:10 fkaleo