cardano-db-sync icon indicating copy to clipboard operation
cardano-db-sync copied to clipboard

Support `tx_out.tx_hash` field

Open kderme opened this issue 2 months ago • 2 comments

Currently the tx_out table contains a tx_id field, with the autoincremental id of the tx it belongs to. However this means if we want to resolve an input, ie tuple of (tx hash, tx index), we first need to resolve the tx hash -> tx id. This means querying 2 tables, tx and tx_out, instead of simply one. We do have a fairly efficient cache for the first query, however, including the tx_out.tx_hash means we can avoid it altogether.

This becomes more useful, since we want to use Postgres as a backstore for UTxO-HD, effectively reducing memory https://github.com/IntersectMBO/cardano-db-sync/issues/1684

There are many options: This field could replace tx_out.tx_id field, or it could be a new field or it could replace it based on some config option.

kderme avatar Sep 17 '25 16:09 kderme

Unless this is about replacing tx table altogether, it brings little benefit (definitely not from performance pov, as looking at an indexed hash has higher resource cost than tx_id field which is well maintained index in a foreign table). More often than not - when querying tx input/output, we also fetch block details, which will need us to do that join anyways.

Secondly, it brings back the ugly behaviour of having duplicate hash strings in multiple tables + having duplicate hash strings within tx_out table, using up more storage and index space, for no performance gains

rdlrt avatar Sep 17 '25 21:09 rdlrt

Unless this is about replacing tx table altogether, it brings little benefit

A clarification that the objective here is to reduce future queries that dbsync itself has to do, eventually reducing its memory footprint without a big hit on syncing speed. We need to investigate if the tx hash -> tx id cache that DBSync maintain is good enough or we need to avoid the query altogether.

kderme avatar Sep 18 '25 10:09 kderme