product-backlog Add annotation.reference information to the CSV exports

The problem CSVs are much more approachable than JSON files for the average user, and instructors using annotation exports for various kinds of analysis want to see the relationship between annotations.

Example ticket: https://app.hubspot.com/contacts/6291320/record/0-5/18074066011

The solution In the "export" option in the client, include the "reference" information so people using the exports for analysis can easily relate or reconstruct the annotation threads.

Example: Current CSV export: Created at Author Page URL Group Type Quote Comment Tags 2024-12-20 10:09 mdiroberts https://example.com/ abc internal testing? Reply reply 2024-10-30 14:02 mdiroberts https://example.com/ abc internal testing? Annotation documents anno question

Proposed CSV export: Created at Author Page URL Group Type ID Reference Quote Comment Tags 2024-12-20 10:09 mdiroberts https://example.com/ abc internal testing? Reply "X72iLr7kEe-8vIsqCNlnHw" "F5YqwJbpEe-kJWcL3BHQxQ" reply 2024-10-30 14:02 mdiroberts https://example.com/ abc internal testing? Annotation "F5YqwJbpEe-kJWcL3BHQxQ" NULL documents anno question

Dec 20 '24 15:12 mkdir-washington-edu

The references field in the API is an array containing every ancestor of the annotation in the thread. Some ancestors may have been deleted, so you need the full list to be sure of being able to associate a reply with its top-level annotation. In JSON this is straightforward to encode as an array. In CSV we'd need to choose an encoding. The simplest solution is a comma-separated list, making sure that the field is properly escaped when exported.

Dec 20 '24 15:12 robertknight

For encoding: currently the list of tags on an annotation are handled correctly by Google Sheets when importing the csv, though I've seen issues with Excel properly decoding them. Excel will keep assuming that each tag in the list is a new column value, steadily displacing all f the data for subsequent rows.

Some additional context from an instructor (to help with prioritization):

JSON files are not practical. I need the text of annotation and replies in a text format to use in a text-based program for qualitative research. It is important to know which reply attaches to what “original” annotation for the purpose of data analysis, since I will need to treat replied to annotation differently to original annotations. Also, for an in-depth content analysis, I need to know which rely matches to what annotation. The replies are almost useless unless I know what they are replying to.

Dec 20 '24 16:12 mkdir-washington-edu

I'd forgotten we'd already had to solve encoding lists for handling the tags field. We should treat references in the same way. The request makes sense and is likely quite straightforward to implement.

JSON files are not practical. I need the text of annotation and replies in a text format to use in a text-based program for qualitative research.

For what it's worth, an interim solution may be to use AI to help with this:

Go to ChatGPT
Start a new chat and attach an exported JSON file
Enter a prompt like: "Convert the records in this JSON file to CSV. Include only these fields: ID, username, text, tags, references."

This worked for me for a small file of 10-20 annotations. Not sure if it will work with a much larger one.

Dec 20 '24 17:12 robertknight