tlsn icon indicating copy to clipboard operation
tlsn copied to clipboard

WIP: Data Annotation & ZKP Tooling

Open sinui0 opened this issue 2 years ago • 5 comments

Doing a brain dump here, this will potentially be refined over time as I find more bandwidth to ponder ahead on these topics.

Overview

In most applications of our protocol the Requester will have context of the application data involved in their session prior to query time, or at least a strategy for processing them. For example, many websites have adopted API standards such as Swagger which provide an extensive description of the methods and data models in thereof. We need a suite of tooling which helps developers annotate APIs, and generally any structured data, with metadata depicting privacy characteristics and potentially business logic.

To provide a concrete example, suppose there is a website which uses HTTP to serve JSON application data. A simple session transcript could look something like this:

--- Request
GET /profile HTTP/1.1
User-Agent: Mozilla/5.0 (X11; Linux x86_64)
Host: www.example.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Cookie: user_session=e2AEzq_NF8123fSgjE5hOtPT14ik4_7YkRL-FdDQn-bew2
Connection: Keep-Alive
--- Response
HTTP/1.1 200 OK
Date: Mon, 19 Sept 2022 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Sat, 03 Jan 2009 19:15:56 GMT
Content-Length: 74
Content-Type: application/json
Connection: Closed

{"username":"sinu","address":"123 Main Street","birthday":"July 30, 2015"}

In this example, a developer may wish to annotate particular fields as private, such as the cookie in the request headers or fields in the response payload like address and birthday. The rest of the content of the transcript can be revealed to the verifier as plaintext for substantial savings in proof time and complexity. Additionally, the developer will want to be able to specify invariants regarding the type of underlying data, and assertions on them in a consistent and general manner. From which, the developer will want to be able to generate circuits which prove such assertions over this data.

We will need to support several different data formats, eg HTTP records, JSON, YAML, XML, HTML, CSV etc. I expect all of these will be in high demand.

Notice that none of the above is specific to TLSNotary, and that is to our benefit. Developers of any ZK applications would enjoy having such tooling. Maybe they already exist? I haven't found them yet.

The interface with our protocol will be relatively small, where we would simply require the calling code provide a collection of byte slice indices and corresponding commitment types for each (ie SHA2, Poseidon, PRG etc), I'll dub this as a "Commitment Strategy"

Notarization

I forsee two different workflows during notarization: templated and manual.

Templated In the browser context most flows shouldn't require arbitrary code execution, but rather we provide a templated workflow where the developer can specify all the required information in a single file which the browser extension can use to guide a user through an application flow, eg logging in and fetching profile, generating suitable commitments. These workflow templates can be available in a public sort of "appstore" where a market/system of reputation can develop.

Here is a dumb toy example of what this could look like:

name: Export profile
host: www.example.com
requests:
  - path: /profile
    get:
      parameters:
        - name: username
          required: true
          schema:
            type: string
        - name: password
          required: true
          schema:
            type: string
            private:
              enabled: true
      responses:
        '200':
          content:
            application/json:
              schema:
                username:
                  type: string
                address:
                  type: string
                  private:
                    enabled: true
                    commitment: poseidon
                birthday:
                  type: date
                  private:
                    enabled: true
                    commitment: poseidon

Manual We can support a manual flow for advanced usecases which provides more flexibility to the developer, but this may be harder to do safely. I have some ideas for this but won't elaborate now

Selective Disclosure

Once we have some sort of format/system for annotating structured data, we will need to be able to auto-generate circuits based on them. We should be able to abstract away most details of the inner workings of ZKPs and provide a simple API.

sinui0 avatar Sep 20 '22 04:09 sinui0

Thanks for that. I like that template format.

themighty1 avatar Sep 20 '22 05:09 themighty1

support several different data formats, eg HTTP records, JSON, YAML, XML, HTML, CSV etc.

👀 Any appetite for building them in Noir? (Happy to e.g. co-sponsor a grant too if it helps.)

Savio-Sou avatar Jul 13 '23 14:07 Savio-Sou

Any appetite for building them in Noir?

@Savio-Sou we will be implementing our tooling in Rust, aiming to be proving system/DSL agnostic. It should work with Noir

sinui0 avatar Jul 19 '23 16:07 sinui0

@sinui0 ah very cool!

I.e. this is more about data pre-processing (transpiling?) before any ZK-proving?

Savio-Sou avatar Jul 20 '23 03:07 Savio-Sou

@Savio-Sou yes, this is more about tooling for pre-processing/parsing data and applying privacy/commitment semantics to common formats. Ie specifying what field is private, where the bytes of that field is located, the type of commitment to it, and eventually what statement is proved about it.

sinui0 avatar Jul 20 '23 20:07 sinui0