grants-and-bounties icon indicating copy to clipboard operation
grants-and-bounties copied to clipboard

New Highload Wallet

Open pyAndr3w opened this issue 1 year ago • 17 comments

Summary

Highload wallet implementation in the TON blockchain is currently bulky and has security concerns. This issue proposes a leaner and more secure alternative to the current highload wallet design.

Context

Highload wallet is smart contract in the TON blockchain designed to manage a large volume of transactions effectively while ensuring the security of assets. However, the existing implementation has been found to be bulky, leading to higher operational costs and security issues. One of the key problems in the current implementation is the overflow of the hashmap of requests. If an overflow occurs, the smart contract can become completely inoperative, leading to loss of functionality and potentially compromised security. Given these concerns, a new design is needed to ensure the wallet remains efficient and secure even under heavy transaction loads.

Goals

  1. Create a more secure and cost-effective implementation of the highload wallet.
  2. Simplify the existing design to reduce complexity and improve usability.
  3. Fix current security issues, including the hashmap overflow problem.

Deliverables

  1. Smart Contract: Develop a new smart contract for the highload wallet, optimizing for performance, security, and resistance to hashmap overflow.
  2. TL-B Schema: Create and document a TL-B schema for the request/response interactions with the smart contract.
  3. Contract Tests: Write comprehensive tests to validate the smart contract's functionality, robustness against common vulnerabilities, and protection against hashmap overflow.
  4. Usage Example in TypeScript: Provide a working example of how to interact with the smart contract using TypeScript, serving as a reference for developers looking to integrate or interact with the redesigned highload wallet.

Definition of Done

  • [x] Smart Contract on fift-asm
  • [x] Smart Contract on FunC (extra-version)
  • [ ] Full specification
  • [ ] Detailed documentation with edge cases
  • [ ] Usage example in typescript (ton-core) and c# (tonsdk.net)
  • [x] Blueprint tests and wrappers for the contract
  • [ ] The code review by @akifoq

Reward

  • Standard TON Footstep NFT
  • 1500 USD in TON equivalent
  • 750 USD in TON equivalent (The code review by @akifoq)

Oriental Release Date

One month after approving

pyAndr3w avatar Aug 21 '23 06:08 pyAndr3w

@delovoyhomie, what do you think about it?

pyAndr3w avatar Aug 23 '23 07:08 pyAndr3w

@pyAndr3w happy to discuss it in DM at telegram :)

Naltox avatar Aug 24 '23 13:08 Naltox

@pyAndr3w lets add "blueprint tests and wrappers for the contract" to the deliverables. Also i think 1200 USD would make much more sense in terms of pricing.

Other than that we are good to go!

Naltox avatar Aug 31 '23 14:08 Naltox

@Naltox regarding the reward, I believe we should take into consideration that the development is planned for Fift-asm, not FunC, which will provide more opportunities for optimizations but will require slightly more effort.

pyAndr3w avatar Aug 31 '23 15:08 pyAndr3w

@delovoyhomie, I'm ready to start developing in the current view

pyAndr3w avatar Oct 17 '23 18:10 pyAndr3w

@delovoyhomie,

The smart contract is almost ready. I just need to add the "processed?" get-method (is this get-method necessary at all?).

I've written some tests, but I still request a review.

Since the contract is written in fift-asm, it requires someone with sufficient expertise for review. I suggest @Skydev0h or @akifoq, as they have significant experience specifically in fift-asm.

Repository link: https://github.com/continuation-team/highload-wallet-v3/tree/draft

pyAndr3w avatar Nov 22 '23 10:11 pyAndr3w

Features of the new highload wallet:

  • Performs the "cleanup hashmap in 1 action" instead of iterating through the hashmap in a loop, as implement in v2. This allows clearing the hashmap at the beginning of execute, enabling the wallet to be used even if the hashmap is fully occupied.

  • Requests are stored as 1 bit in the hashmap, as opposed to 1 entry in v2. This means that instead of 32k entries, it can store 16 million, allowing for many more requests to be sent without overloading the hashmap.

  • "Hardcoded" valid_until set to 128 seconds. This fixes the issue in v2 where entries with a large valid_until took a long time to be removed from the hashmap, potentially leading to hashmap overload and making it impossible to continue using the wallet.

  • Uses "preprocessed c5" instead of a hashmap for messages. This significantly reduces gas consumption when sending messages.

pyAndr3w avatar Nov 22 '23 11:11 pyAndr3w

Now a code review from @akifoq and @EmelyanenkoK is being conducted. @akifoq, could you please confirm your readiness here for a compensation of $750?

delovoyhomie avatar Dec 07 '23 10:12 delovoyhomie

I confirm it.

akifoq avatar Dec 08 '23 12:12 akifoq

@akifoq, any updates?

pyAndr3w avatar Dec 20 '23 22:12 pyAndr3w

I have conducted a manual code review and found several issues. Also an informal security analysis is coming tomorrow.

1. Too early hashmap cleanup breaks replay protection

Suppose last_cleaned = 100 and we receive a query Q at the moment now() = 160 with created_at = 160. It gets stored into the queries hashmap, but last_cleaned is not updated. Now if we send the same query Q at the moment 100 + 128 + 1 = 229, it is still valid, but both of the hashmaps are discarded, so the query is processed the second time, thus breaking the replay protection.

Suggested fix

Use 128 and 256 seconds delays instead of 64 and 128.

A sketch of a proof of correctness

Note that a query is processed only if now - 128 <= created_at <= now. Let T be the time of the last fully processed transaction on the contract (not including the current one).

Let's verify the following invariants:

  1. For every query "presented" in the queries hashmap, its created_at <= T. Indeed, a query can end up in the hashmap only when added on the transaction processing it, and T is obviously non-decreasing.
  2. For every query "presented" in the old_queries hashmap, its created_at <= last_cleaned. Indeed, a query can end up in the hashmap only by the hashmap shifting ("queries->old_queries"), after which last_cleaned is set to be now >= T >= created_at by the first invariant.

Let's now verify that a query is deleted from the hashmaps only when its created_at < now - 128:

  1. If last_cleaned < now - 128, every query from old_queries is deleted. created_at <= last_cleaned for such query by the second invariant, so indeed createad_at < now - 128 by transitivity.
  2. If last_cleaned < now - 256, queries from queries are also deleted. Note that in such case T < now - 128, because otherwise last_cleaned would be updated in the last transaction (that is, now - 256 > last_cleaned >= T - 128). So created_at <= T < now - 128 by the first invariant.

Note on testing

If possible, I suggest to add an automated stress-test which is able to find this bug in the current implementation (by generating queries with random intervals between them and checking that they can't be replayed. It's also benefitial to focus on finding possible +-1 bugs in the new version).

2. It's possible to add unsigned data to the external message wasting balance on import_fee.

The message signature is read from the message simply as the remainder of the slice after LDREF loads the payload. However, CHKSIGNU takes only the first 512 bits of the signature slice and doesn't throw if it has some other data. So it's possible for an attacker to attach large additional cells to the message without invalidating the signature.

Suggested fix

Add a 9 PUSHPOW2 LDSLICEX ENDS check after LDREF in the line 40 or before s3 PUSH in the line 60.

Minor improvement suggestions

  1. Use zero-based numbering of bits instead of one-based. It allows to use fewer number of DECs and is more natural. You are still protected from possible cell overflow because bit_number related logic happens before the ACCEPT.
  2. Use 40 bits for last_cleaned instead of 64.
  3. Replace ZERO SPLIT by LDSLICEX.
  4. Consider replacing b{0} SDBEGINSQ 36 THROWIFNOT by b{0} SDBEGINS.
  5. Consider adding processed? get-method similar to highload-wallet-v2.
  6. Consider adding a subwallet_id field.
  7. Consider merging shift and bit_number into single query_id for applications convenience (and maybe then extract bit_number by mod 1023 instead of mod 1024?).

akifoq avatar Dec 23 '23 00:12 akifoq

At the moment, I have fixed the main issues and implemented some improvements. Left to do:

  • implement get-method "processed?"
  • more tests

I have a question about how best to implement the get-method "processed?", given that created_at is not written to the queries hashmap. It might be worth making created_at separate from query_id.

pyAndr3w avatar Dec 24 '23 21:12 pyAndr3w

The most straighforward way to check if a query was processed is to iteratate over transactions happened in the corresponding time frame, but it's not a particularly elegant way. It might be slow and requires manual parsing of the transactions. The highload-wallet-v2 processed? get-method helps avoid iterating over transactions by invoking this method at the state from the last block created before valid_until of the query. Using the 1 returned value, we can also implement a binary search to find that block.

Simply invoking this method at the current state is obviously not enough as it gives no info on older queries. Strictly speaking, a node is not obligated to store block history, which suggests that the proper way to handle cleanups is by explicit owner requests, contradictorily to highload-wallets-v2/v3. However, in practice it is usually possible to look on recent blocks.

On the other hand, the highload-wallet-v2 has a nice property that a processed query is uniquely determined by query_id. If the offchain application doesn't issue queries with the same query_id, this property would also hold for unprocessed queries as well.

The highload-wallet-v3 doesn't have such property. The number of different query_ids is bound by 2^24, so sooner or later the application would need to reuse some old id. However, the application might internally identify queries not only by the query_id, but by the pair (query_id, created_at), maintaining the property. In this way query_id rather becomes replay_protection_id.

In principle, you can require to supply created_at as an argument of the method. However, note that the presence of replay_protection_id in the hashmap doesn't mean the (replay_protection_id, created_at) query was processed, because we might have issued another query with the same replay_protection_id, but different created_at (hence different id), in a short time period. So it seems that the processed? get-method just can't be implemented at all without somehow redefining the notion of query_id. Maybe it's better to extend the length of the hashmap keys.

id_presented? get-method may be implemented instead (and maybe max_presented_id as well?).

Also an offchain application might add a log message to the provided c5 containing any identification of the query it uses to simplify the query lookup by the iteration.

But I suspect in practice applications will iterate over transactions looking not even for queries, but for specific messages needed to be sent.

akifoq avatar Dec 25 '23 00:12 akifoq

New requirements (as requested by @thekiba):

  • Full specification
  • Hignload-v3 version on FunC (for ease of auditing)
  • Detailed documentation with edge cases

(it would be good to increase the reward by 300 usd in ton)

Remaining steps of old requirements:

  • Add get method get_subwallet_id
  • Usage examples in ts (ton-core) and C# (tonsdk.net)

pyAndr3w avatar Jan 26 '24 10:01 pyAndr3w

LGTM

thekiba avatar Jan 26 '24 13:01 thekiba

LGTM! The amount of the reward has been increased to $1500.

delovoyhomie avatar Jan 26 '24 13:01 delovoyhomie

LGTM

Naltox avatar Jan 26 '24 14:01 Naltox