stacks-blockchain-api feat: contract log query

Closes https://github.com/hirosystems/stacks-blockchain-api/issues/1598

This PR adds two new query params to the /extended/v1/contract/<principal>/events endpoint.

?filter_path=<jsonpath expression>: Optional jsonpath expression to select only results that contain items matching the expression.
?contains=<json object>: Optional stringified JSON to select only results that contain the given JSON.

Rationale

Clarity smart contracts are able to generate structured log outputs during the execution of a contract-call or contract-deploy transaction. For more context see the Clarity print docs. These contract logs can then be ingested by an event-observer (such as the API) to store and serve contract-specific data. Today we have several examples of this:

The bns contract emits logs used by the API to process, track, and serve clients BNS information like ownership, zonefiles, etc.
The pox contract emit logs used by the API to store and track stacking related data, which is used in account balance endpoints (i.e. locked balance), Rosetta queries, endpoints for Stacking pool operators, etc.
The send-many-stx contract, used by essentially all exchanges, performs batch stx-transfers and uses contract logs to emit a corresponding memo, which the API stores and returns in an endpoint also used by exchanges to track when their wallets have received a batch stx-transfer and the associated memo.
SIP-019 defines contract logs that FT/NFT tokens can implement in order to issue metadata update notifications.

These are some of the contracts and logs that this API and others currently have special handling for. Typically, for each of these contracts, the log events have unique code implemented to denormalize the log payload, store in a new table, and new endpoints + sql queries.

There are many more deployed contracts which have their own application-specific usage of contract logs, with their own unique payloads. The API doesn't and cannot reasonably implement custom handling for all of the current and future contracts in the same way it has for the contracts listed above. Currently, clients must use the /extended/v1/contract/{contract_id}/events endpoint. This simply returns the latest N logs for a given contract. Clients must paginate through these requests, and manually parse out and filter the logs they need. Even for the contracts that already receive special treatment (e.g. pox, bns), there are requests for new endpoints with different query criteria which would require us to implement new endpoints and sql queries.

In the Ethereum world, the ability to emit and query structured logs is a critical element of their app ecosystem. Token standards that you may be familiar with (e.g. ERC20) specify the logs that contracts must emit when token operations happen. The eth_getLogs RPC endpoint provides a mechanism for efficiently querying those logs. For example "give me all transfers for a given FT associated with a given recipient address".

In Stacks, we have have the building blocks for offering similar capabilities.

Contact log query implementation

Prior to this PR, contract logs (i.e. structured Clarity print outputs) were stored in a table in the contract_logs in their original consensus-serialized binary encoding. This encoding made it essentially impossible to perform queries against the contents of the log object. Postgres supports a storing arbitrary structured documents in a jsonb column type. This column type also supports a few types of advanced indexes (e.g. GIN(jsonb_ops) and GIN(jsonb_path_ops)) that make it possible perform efficient querys against the arbitrary JSON.

So what we can do is 1) decode the binary Clarity buffers into an object, 2) encode that into a JSON object, 3) store that in a new jsonb column in the contracts_log table, then 4) implement API endpoint(s) that allow more precise queries against these logs.

This PR implements the above. One of the design decisions was to implement yet-another way to encode Clarity values into JSON. The existing schemas we have are lossless (you can reverse the JSON back into the original Clarity value), however, that comes with overhead from having to store Clarity type information (because there is not a 1-1 mapping between JSON primitives and Clarity primitives), so that JSON is less readability and takes up more space. In our use-case, we aren't necessarily concerned with preserving exact Clarity typing, rather we want the JSON to be easy-to-query, readable, space-efficient, index-efficient, and have good interoperability with JSON-based workflows (e.g. jsonpath expressions). Here is the spec for how the Clarity values are encoded into a JSON object in this PR:

OptionalSome and ResponseOk are unwrapped (i.e. the value is encoded directly without any nesting).
OptionalNone is encoded as json null.
ResponseError is encoded as an object with a single key _error which is set to the unwrapped error value.
Buffers are encoded as an object containing the key hex with the hex-encoded string as the value, and the key utf8 with the utf8-encoded string as the value. When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD � will be used to represent those errors. Both of these are included because Clarity Buffers are used for storing both binary data and string data (e.g. BNS names).
Ints and UInts that are in the range of a safe js integers are encoded as numbers, otherwise they are encoded as string-quoted base-10 integers.
Booleans are encoded as booleans.
Principals are encoded as strings, e.g. <address> or <address>.<contract_name>.
StringAscii and StringUtf8 are both encoded as regular json strings.
Lists are encoded as json arrays.
Tuples are encoded as json objects.

Usage

Here's an example of how the subnets-nft-demo app can be refactored. The app fetches a list of all events, then on the client decodes of the Clarity values and filters for specific events. This can now be done in a single query leveraging the new filter_path query param. This accepts a jsonpath expression.

Previous code: https://github.com/hirosystems/subnets-nft-demo/blob/34e433a2d2893f36e1767ce635ee280baf1acbf6/src/stacks/apiCalls.ts#L96-L111

Example of doing the same thing with a single fetch using a jsonpath expression filter:

const url = new URL(`${L2_URL}/extended/v1/contract/${L2_SUBNET_CONTRACT_ID}/events`);
url.searchParams.set('filter_path', `&?(
  @.event == "withdraw" && 
  @.type == "stx" && 
  @.sender == "${address}"
)`);
const req = await fetch(url);

Example of using the new contains param to perform a similar query:

const url = new URL(`${L2_URL}/extended/v1/contract/${L2_SUBNET_CONTRACT_ID}/events`);
url.searchParams.set('contains', JSON.stringify({
  event: 'withdraw',
  type: 'stx',
  sender: address,
}));
const req = await fetch(url);

Here's another example of using jsonpath expression filter to fetch recent BNS name-revoke and name-renewal events:

const bnsContract = 'SP000000000000000000002Q6VF78.bns';
let url = new URL(`${API_HOST}/extended/v1/contract/${bnsContract}/events`);
url.searchParams.set('filter_path', `$.attachment.metadata?(
  @.op == "name-revoke" ||
  @.op == "name-transfer" ||
  @.op == "name-renewal"
)`);
const req = await fetch(url);

Here's an example of using the new contains param. This is also a filter, but it takes a JSON object and returns only the contract log events that contain the given object. Here's an example of fetching recent BNS name-revoke events:

const bnsContract = 'SP000000000000000000002Q6VF78.bns';
let url = new URL(`${API_HOST}/extended/v1/contract/${bnsContract}/events`);
let contains = { attachment: { metadata: { op: 'name-renewal' } } };
url.searchParams.set('contains', JSON.stringify(contains));
const req = await fetch(url);

Considerations

The jsonpath expressions supported in Postgres can include relatively complex operations. For example, recursive key-value lookups, regex, math operations, type coercions, and more. Many of these cannot take advantage of the postgres JSON-specific indexes, which can result in unreasonably expensive queries. The vast majority of expected use-cases only need simple jsonpath expressions which my testing so far have shown to be reasonably efficient. So we need to determine when a given jsonpath expression is "too complex" and reject the request. This PR has a very simple regex-based version of that implemented, however, these regexes are easy to trick and bypass. We need to validate against the actual AST of a jsonpath expression. Existing jsonpath parsing libraries in js do not support the postgres flavor of expressions, so I'm in the process of switching the regex-based validation to jsonpath-pg. This is a library I've wrote which compiles the grammar and lexer C code from the postgres codebase into WASM and exposes the AST object. This can be used to reliable determine the complexity of an expression.

Alternatives

The capabilities described above could potentially be implemented with other tools. For example, no-sql solutions like GraphQL or MongoDB. However, I'd like to explore the possibility of getting this approach to work before considering adding new systems/layers into the project's tech stack.

Upgrading + database migration

This PR changes the sql schema and adds a new non-nullable column to an existing table. Typically, we'd modify the existing migration tables (which breaks existing dbs), and release this as a major version that requires an event-replay. This is easy for API contributors, however, these kinds of breaking changes are difficult for deployment, both internal and external, because event-replays increasingly take more time and resources.

So this PR is experimenting with an more advanced sql schema migration which is compatible with existing deployments and does not require a major version bump or event-replay. Migration libraries (including the one we use) tend not to support advanced table manipulation like what this PR requires, so a new function is called directly after the regular migrations in order to perform the "advanced" migrations. In this case, it takes around half an hour to run this migration because it involves querying and updating every row in the contract_logs table (which internally postgres treats as a DELETE and INSERT, without the ability to batch).

In a future release when a breaking change is strictly required, the "advanced migration" function for this case can be dropped.

TODO

[ ] Endpoint integration tests
[x] Document the migration approach that avoids needing an event-replay
[x] Document the new Clarity value JSON-compact encoding format implemented that was implemented power these filters

Mar 31 '23 18:03 zone117x

🚀 Deployed on https://deploy-preview-1601--stacks-blockchain-api.netlify.app

Mar 31 '23 18:03 github-actions[bot]

Codecov Report

Merging #1601 (874ad8b) into develop (5d15bc8) will decrease coverage by 55.96%. The diff coverage is 46.25%.

:exclamation: Current head 874ad8b differs from pull request most recent head 6d94c0e. Consider uploading reports for the commit 6d94c0e to get more accurate results

@@             Coverage Diff              @@
##           develop    #1601       +/-   ##
============================================
- Coverage    77.41%   21.45%   -55.96%     
============================================
  Files           78       78               
  Lines        11161    11302      +141     
  Branches      2487     2521       +34     
============================================
- Hits          8640     2425     -6215     
- Misses        2403     8198     +5795     
- Partials       118      679      +561

Impacted Files	Coverage Δ
src/datastore/common.ts	`100.00% <ø> (ø)`
src/datastore/pg-store.ts	`0.90% <0.00%> (-91.70%)`	:arrow_down:
src/api/routes/contract.ts	`20.75% <6.25%> (-68.99%)`	:arrow_down:
src/datastore/migrations.ts	`40.40% <26.00%> (-47.84%)`	:arrow_down:
src/api/query-helpers.ts	`20.74% <60.00%> (-58.31%)`	:arrow_down:
src/helpers.ts	`37.30% <80.55%> (-33.35%)`	:arrow_down:
src/datastore/pg-write-store.ts	`1.26% <100.00%> (-85.98%)`	:arrow_down:

... and 52 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Mar 31 '23 18:03 codecov[bot]

I'd like to get more feedback on this feature to prioritize work on it. My sense is that if this were available, then A) contract authors would take more advantage of structured logs (i.e. print statements) and B) apps would leverage this API feature to make more straightforward and efficient queries.

Jan 17 '24 12:01 zone117x