stacks-blockchain-api icon indicating copy to clipboard operation
stacks-blockchain-api copied to clipboard

Should use `varchar(n)` in place of text

Open aulneau opened this issue 3 years ago • 2 comments

many columns are of text type, where indexes don't really work with that format. I've found performance improvements for changing columns that hold Stacks addresses to VARCHAR(256). This allows indexes to work (or work better). Other fields could be varchar too, such as contract names, etc. Whatever the limit in clarity is can be reflected here too.

aulneau avatar Jan 13 '22 15:01 aulneau

here is an example with the nft_custody view, using the nft_events table where I've changed the type of text to varchar(256) before:

Aggregate  (cost=10498.35..10498.36 rows=1 width=32)
  ->  Gather  (cost=1000.00..10498.13 rows=14 width=105)
        Workers Planned: 2
        ->  Parallel Seq Scan on nft_custody  (cost=0.00..9496.73 rows=6 width=105)
              Filter: (recipient = 'SPS2RBYAXSCXMVPYXSG724CFY4W2WA2NPG44V191'::text)
  SubPlan 1
    ->  Result  (cost=0.00..0.01 rows=1 width=32)

after:

Aggregate  (cost=55.35..55.36 rows=1 width=32)
  ->  Bitmap Heap Scan on nft_custody  (cost=4.52..55.15 rows=13 width=105)
        Recheck Cond: ((recipient)::text = 'SPS2RBYAXSCXMVPYXSG724CFY4W2WA2NPG44V191'::text)
        ->  Bitmap Index Scan on nft_custody_recipient_idx  (cost=0.00..4.52 rows=13 width=0)
              Index Cond: ((recipient)::text = 'SPS2RBYAXSCXMVPYXSG724CFY4W2WA2NPG44V191'::text)
  SubPlan 1
    ->  Result  (cost=0.00..0.01 rows=1 width=32)

cc @zone117x @rafaelcr

aulneau avatar Jan 13 '22 15:01 aulneau

Thank you for the debugging and data @aulneau! This kind of help is much appreciated.

In this case, I'm curious of the real-world performance differences around text type vs varchar(n). The postgres docs https://www.postgresql.org/docs/12/datatype-character.html state:

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

The above "before" analysis looks like no index was used at all, which is strange. Were these tests ran against a prod db / dataset? SQL query planners are non-deterministic. If these test were ran against a small dataset and/or db that hasn't received much traffic, it could show results that aren't applicable to real world usage.

zone117x avatar Jan 13 '22 16:01 zone117x

Closing as stale for now

rafaelcr avatar Sep 20 '22 18:09 rafaelcr