pyroscope icon indicating copy to clipboard operation
pyroscope copied to clipboard

Panic in `buildTreeFromParentPointerTrees`

Open kolesnikovae opened this issue 1 year ago • 15 comments

I've discovered a panic stack trace that indicates a problem in buildTreeFromParentPointerTrees:

Image

However, the problem is apparently somewhere in the SplitStacktraceIDRanges implementation (PartitionWriter), presumably in SampleAppender.Samples:

resolver_tree.go

	// If the number of samples is large (> 128K) and the StacktraceResolver
	// implements the range iterator, we will be building the tree based on
	// the parent pointer tree of the partition (a copy of). The only exception
	// is when the number of nodes is not limited, or is close to the number of
	// nodes in the original tree: the optimization is still beneficial in terms
	// of CPU, but is very expensive in terms of memory.
	iterator, ok := symbols.Stacktraces.(StacktraceIDRangeIterator)
	if ok && shouldCopyTree(appender, maxNodes) {
		ranges := iterator.SplitStacktraceIDRanges(appender) // <--------------- The problem is apparently here
		return buildTreeFromParentPointerTrees(ctx, ranges, symbols, maxNodes)
	}
	// Otherwise, use the basic approach: resolve each stack trace
	// and insert them into the new tree one by one. The method
	// performs best on small sample sets.
	samples := appender.Samples()
	t := treeSymbolsFromPool()
	defer t.reset()
	t.init(symbols, samples)
	if err := symbols.Stacktraces.ResolveStacktraceLocations(ctx, t, samples.StacktraceIDs); err != nil {
		return nil, err
	}
	return t.tree.Tree(maxNodes, t.symbols.Strings), nil

The query that caused the panic targets large in-memory data set, and I can't reproduce it easily.

One way to solve the problem is to disable the optimization: it was added relatively recently (a couple of month ago) and is only helpful when the stack trace tree includes millions of nodes.

However, I'm going to disable stack trace chunking (into ranges) altogether – this piece is useless but complicates a lot of things – this will not fix the problem for already written data, but will prevent this for happening in the future

kolesnikovae avatar Sep 18 '24 09:09 kolesnikovae

We can add this is a query parameter to /v1/query, maybe as /v1/query?tx_isolation=none. This should also be allowed only for admin users.

0x777 avatar Oct 04 '19 09:10 0x777

@0x777 Is it possible to add this as a parameter or arg within the up and down migration files? as an alternate method.

Sometimes we do want transactions, and sometimes we don't. The default can be with transactions, and with an arg they can be turned off. And then we continue to use hasura migrate apply without any modification.

avimoondra avatar Oct 04 '19 22:10 avimoondra

@0x777 Ah I see a miscommunication on my end - the request was to not have transactions when doing migrations specifically, not really queries.

avimoondra avatar Oct 04 '19 22:10 avimoondra

@avimoondra Thanks for bringing this up. Could you clarify the title of the issue, please? Maybe something like "Enable non-transactional migrations".

Without the ability to CREATE INDEX CONCURRENTLY, we will have to stop using Hasura for migrations. The feature seems otherwise clean so it's a bummer to have to switch.

haphut avatar Apr 13 '21 12:04 haphut

I'm looking for this too. Migrating from Django where non-atomic migrations are standard.

ledburyb avatar Aug 11 '21 16:08 ledburyb

This would be a much appreciate feature.

We have this in Ruby on Rails which allows us to do CONCURRENTLY index creation: https://www.rubydoc.info/docs/rails/ActiveRecord%2FMigration:disable_ddl_transaction

HoyaBoya avatar Sep 17 '21 21:09 HoyaBoya

This is also needed for any DDL actions for timescaledb for creating continuous aggregates etc because it uses two separate transactions internally. For now, a workaround is to manually apply the up.sql file and then running hasura migrate apply --up 1 --skip-execution to mark the migration applied in the tracker.

But it'd be really nice to have a CLI flag like --dangerous-manual-transaction that tells the engine to skip creating a transaction automatically and trust that the person running apply is taking care of the transactions manually.

creatorrr avatar Nov 26 '22 04:11 creatorrr

Hi, is there any news on this topic?

Concurrent index creations and deletions would be very nice to have! Otherwise, any updates on the linked tables would be locked during the entire duration of the migration (shameless plug of an article I've written on this topic).

The downside is that concurrent index management would take more time to finish (two parses of the table are needed instead of one, and PG would wait for any transaction to finish between these steps). It can be relatively scary, even if harmless.

ThHareau avatar Oct 05 '23 11:10 ThHareau

I definitely would love to see this added. The more our database grows, the more we need to have indexes added concurrently which cannot be done through migrations.

EricMeuse-HenryMeds avatar Mar 27 '24 19:03 EricMeuse-HenryMeds

+1

agrandin0 avatar Sep 13 '24 13:09 agrandin0

Hi everyone. Thanks a lot for your feedback and patience on this issue. We have prioritised this now, evaluating the best way to handle this requirement without breaking existing flows. We should have an update in a few days on how we plan to support this.

manasag avatar Feb 05 '25 00:02 manasag

We should have an update in a few days on how we plan to support this.

Any update?

gabegorelick avatar Feb 20 '25 01:02 gabegorelick

Hello, we've pushed the following changes to enable non-transactional migrations:

  • Server: https://github.com/hasura/graphql-engine/commit/15d3c1b04ce3ae7f10593d13b3eec36b98979476
  • CLI: https://github.com/hasura/graphql-engine/commit/2e98439e048f1646b9d7fea9605a7823b59b0f1f

These changes are scheduled for the upcoming 2.46 release. We're also aiming to release a 2.46-beta by the end of this week.

rakeshkky avatar Feb 26 '25 08:02 rakeshkky

Hi @rakeshkky!

I'm glad that you are addressing this issue, however I am slightly disappointed with the chosen solution. The proposed solution is great when knowing in advance if the migration needs a transaction, however in the case of continuous deployment, the automation is not aware of the content of the migration, and cannot (and should not try to) guess the transactionallity of the migration.

The real solution would be to be able, per migration, to specify if the migration should be run with or without a transaction.

For example, in dbmate, this is set with a SQL comment:

-- migrate:up transaction:false

Or with graphile-migrate:

--! no-transaction

Now that the server accepts a no-transaction flags, would it be possible for migrate to detect from the migration whether to run the migration in a transaction or not?

ThHareau avatar Feb 26 '25 09:02 ThHareau

The proposed solution is great when knowing in advance if the migration needs a transaction

It seems like the workflow would be to run all your migrations with this flag and ensure that every migration brings their own transaction where applicable (i.e. that you always wrap your SQL in BEGIN...END where appropriate). How you'd ensure that every migration brings their own transaction seems to be left up to users, which is perhaps fair.

The real solution would be to be able, per migration, to specify if the migration should be run with or without a transaction.

I am of the same opinion, and voiced this with my Hasura reps. For me, it's tougher to transition to this flag in a large project with a ton of active migrations in various stages of development since migrations can't use BEGIN...END while this flag is off, but must use BEGIN...END while the flag is on (or risk partially applied migrations not being rolled back). So it requires a lot of coordination to toggle this flag on. Allowing migration-by-migration opt in would be easier. Hopefully that's something that comes later.

gabegorelick avatar Feb 26 '25 15:02 gabegorelick