tabby icon indicating copy to clipboard operation
tabby copied to clipboard

Include language in events message or completion ID in completions message

Open rzuckerm opened this issue 1 year ago • 7 comments

Please describe the feature you want

In order of preference:

  1. Include the programming language in the /v1/events message
  2. Include the completion ID in the /v1/completions message

Additional context

I have been snooping traffic between our Tabby server and the user to see what Tabby sends when it offers a suggestion and when the user either completely accepts or partially accepts a suggestion. The end goal is to collect the following metrics:

  • What languages are being used
  • For each language:
    • How many suggestions have been offered
    • How many of those suggestions are completely accepted by the user
    • How many of those suggestions are partially accepted by the user

From my analysis of the messages, I noticed this:

  • The /v1/completions message contains a programming language but no completion ID
  • The /v1/events message contains a completion ID but no programming language
  • The /v1/events message indicates the following:
    • A suggestion is offered (type is view)
    • A suggestion is partially accepted (type is select and select_kind is present)
    • A suggestion is completely accepted (type is select and select_kind is absent)

I pretty much can get all the data that I need from these messages, but I cannot easily correlate the language in the /v1/completions message with the corresponding event in the /v1/events messages. Ideally, I'd like to just use the /v1/events message since it would simplify the way I would store this information in a database. That's why I would like the programming language in the events message. However, as a fallback, if the completion ID were in the /v1/completions method, then I could use that ID to identify the corresponding event. That's workable because I could just store that ID in a database as a placeholder, indicating that a suggestion is offered and update the database entry if a corresponding event message came in.


Please reply with a 👍 if you want this feature.

rzuckerm avatar Jan 26 '24 19:01 rzuckerm

Hello @rzuckerm, thank you for the detailed feature request. Have you checked the events we logged at ~/.tabby/events? They are more organized and should be ideal for extracting the insights you have in mind.

Example content:

{
  "ts": 1705686904479,
  "event": {
    "view": {
      "completion_id": "cmpl-2360b276-c5c0-4394-8b0b-200db391271e",
      "choice_index": 0
    }
  }
}
{
  "ts": 1705705936255,
  "event": {
    "completion": {
      "completion_id": "cmpl-c4498fd3-d541-4ead-8492-67184d86c539",
      "language": "python",
      "prompt": "<fim_prefix>def is_prime(n):\n<fim_suffix>\n<fim_middle>",
      "segments": {
        "prefix": "def is_prime(n):\n"
      },
      "choices": [
        {
          "index": 0,
          "text": "    if n == 2:\n        return True\n    if n % 2 == 0:\n        return False\n    for i in range(3, int(n ** 0.5) + 1, 2):\n        if n % i == 0:\n            return False\n    return True"
        }
      ]
    }
  }
}
{
  "ts": 1705705985123,
  "event": {
    "completion": {
      "completion_id": "cmpl-46b600fc-a541-4375-b6f5-8b4bc55952ab",
      "language": "python",
      "prompt": "<fim_prefix>def is_prime(n):\n<fim_suffix>\n<fim_middle>",
      "segments": {
        "prefix": "def is_prime(n):\n"
      },
      "choices": [
        {
          "index": 0,
          "text": "    if n == 2:\n        return True\n    if n % 2 == 0:\n        return False\n    for i in range(3, int(n ** 0.5) + 1, 2):\n        if n % i == 0:\n            return False\n    return True"
        }
      ]
    }
  }
}

wsxiaoys avatar Jan 26 '24 22:01 wsxiaoys

One requirement that I forgot to mention is that I'd like to collect the IP addresses of the messages in order to keep track of the number of Tabby users. That information is not available in the log file. Also, while possible, monitoring the log files doesn't fit well into the architecture of our analytics service.

rzuckerm avatar Jan 27 '24 14:01 rzuckerm

Well, since completion id is only created on after request to /v1/completions, I don't think we have a way to make it available to traditional nginx style logging.

Shall we follow up in slack channel? Happy to learn / discuss about your use case.

wsxiaoys avatar Jan 29 '24 02:01 wsxiaoys

Well, since completion id is only created on after request to /v1/completions, I don't think we have a way to make it available to traditional nginx style logging.

The /v1/completions was my 2nd choice. My 1st choice is the add the language to the /v1/events message. As for Slack, I don't use that.

Shall we follow up in slack channel? Happy to learn / discuss about your use case.

Sorry, I don't use Slack, but I do have a Discord account.

As for my use-case, my team has an analytics service that tracks various metrics for tools that we develop and maintain. Our analytics server listens to HTTP requests that contain analytics information and logs that information to a database. We use Grafana to query that database in order to produce various graphs and charts that allow us to track how these tools are being used and where we need to focus our development efforts.

One of the metrics we would like to track is the number of Tabby users as well as the percentage of acceptance (partial and complete) of suggestions from Tabby for different languages over time. If it is not feasible to add the language to the /v1/events message, then tracking the overall acceptance is fine, and I will withdraw this request.

rzuckerm avatar Jan 29 '24 13:01 rzuckerm

The /v1/completions was my 2nd choice. My 1st choice is the add the language to the /v1/events message. As for Slack, I don't use that.

I'll suggest go with logs under ~/.tabby/events, with something like http://vector.dev/ to poll the logged events and forwarding to your data storage

wsxiaoys avatar Jan 29 '24 20:01 wsxiaoys

I'll suggest go with logs under ~/.tabby/events, with something like http://vector.dev/ to poll the logged events and forwarding to your data storage

That still doesn't meet the requirement to track users. We get the user from the IP address of the message, and we only use the IP address to count the number of unique users. That is not in the event logs. What is in our event logs for a field called user is null. I guess that's because we have the "Disable anonymous usage tracking" set in VSCode since we don't want to leak any data outside of our company (sorry, we're paranoid about that :smile: ).

rzuckerm avatar Jan 29 '24 22:01 rzuckerm

One thing we will be working on is to have the user field filled with the user's email account for an authenticated server. image, It should be ready around version 0.10 or will certainly be ready before the 1.0 release.

(you might track the progress at https://github.com/TabbyML/tabby/issues/1324)

we don't want to leak any data outside of our company (sorry, we're paranoid about that 😄 ).

Understood—ultimately, that's something that drove us to build Tabby from the very beginning.

wsxiaoys avatar Jan 29 '24 23:01 wsxiaoys