zk icon indicating copy to clipboard operation
zk copied to clipboard

zk graph --format json is not handling note titles with inner double quotes

Open tjex opened this issue 1 year ago • 7 comments

Check if applicable

  • [X] I have searched the existing issues (required)
  • [X] I'm willing to help fix the problem and contribute a pull request

Describe the bug

I have four notes with inner double quotes in their titles, e.g. This is a "note" title.

Running zk graph --json > notes.json dumps json fine except for these notes. In their place, is a singular comma:

    3 ▎   {filename:h5u1.md,filenameStem:h5u1,path:h.....<valid json continues>
E   2 ▎   ,     ■ Value expected
    1 ▎   {filename:jbig.md,filen......<valid json continues>

The weird thing is that there's only three instances of this, despite there being four notes with this quote pattern. Additionally, all four notes are not in the json file. In other words, three have been replaced with a singular column, and one has vanished altogether.

Additionally, it seems like notes with double quotes in their title create some strange behaviour. it is exclusively those notes that are printed to stdout, and after the printing of each note's body, my zsh environment variables are printed as well. Which is super weird, and would freak me out if I didn't know that this project is a legitimate project run by honest people haha.

I'm sure that will be fixed, once the strings get handled, but just wanted to put it out there.

How to reproduce?

  1. run zk graph --json > notes.json and check if it's already bug free
  2. create a note This is a "test" note
  3. run zk index
  4. run zk graph --json > notes2.json

Theoretically, there should be a singular line with a comma

zk configuration

# zk configuration file
#
# Uncomment the properties you want to customize.

# NOTE SETTINGS
#
# Defines the default options used when generating new notes.
[note]
language = "en"
default-title = "untitled"
filename = "{{id}}"
extension = "md"
template = "default.md"

# Path globs ignored while indexing existing notes.
ignore = [
   "d/**",
]

# Configure random ID generation.
id-charset = "alphanum"
id-length = 4
id-case = "lower"

# EXTRA VARIABLES
#
# A dictionary of variables you can use for any custom values when generating
# new notes. They are accessible in templates with {{extra.<key>}}
[extra]

#key = "value"

# GROUP OVERRIDES
# Omitting `paths` is equivalent to providing a single path equal to the name of
# the group. This can be useful to quickly declare a group by the name of the
# directory it applies to.

[group.daily]
paths = ["d/2024"]

[group.daily.note]
filename = "{{format-date now}}"
extension = "md"
template = "daily.md"


# MARKDOWN SETTINGS
[format.markdown]

link-format = "markdown"
# in case for some reason, a white space file makes it into my vault
link-encode-path = true 
link-drop-extension = true

# use YAML tags only
hashtags = false
colon-tags = false
multiword-tags = false


# EXTERNAL TOOLS
[tool]

# Command used to preview a note during interactive fzf mode.
# Set it to an empty string "" to disable preview.

# bat is a great tool to render Markdown document with syntax highlighting.
#https://github.com/sharkdp/bat
#fzf-preview = "bat -p --color always {-1}"


# LSP
[lsp]

[lsp.diagnostics]
# Each diagnostic can have for value: none, hint, info, warning, error
# wiki-title = "hint"
dead-link = "error"

[lsp.completion]
# Show the note title in the completion pop-up, or fallback on its path if empty.
note-label = "{{title-or-path}}"
# Filter out the completion pop-up using the note title or its path.
#note-filter-text = "{{title}} {{path}}"
# Show the note filename without extension as detail.
#note-detail = ""


# NAMED FILTERS
[filter]

# Matches the notes created the last two weeks. For example:
#    $ zk list recents --limit 15
recents = "--sort created- --created-after 'last two weeks'"


# COMMAND ALIASES

[alias]

# Show a random note.
lucky = '$EDITOR $(zk list --format path --sort random --limit 1)'

daily = 'zk new --no-input "$ZK_NOTEBOOK_DIR/d/`date +%Y`"'
s = 'zk edit $ZK_NOTEBOOK_DIR/i/149x.md'
random-orphan = '$EDITOR $(zk list --orphan --format path --sort random -n1 -x d -x g)'
rg = 'rg $1 $(zk list --format path --delimiter " " $2 $3 $4 $5)'
conf = '$EDITOR "$ZK_NOTEBOOK_DIR/.zk/config.toml"'

# Edit the last modified note.
last = "zk edit --limit 1 --sort modified- $@"

# Edit the notes selected interactively, "recents" is a filter from above
recent = "zk edit recents --interactive"

# Print paths separated with colons for the notes found with the given
# arguments. This can be useful to expand a complex search query into a flag
# taking only paths. For example:
#   zk list --link-to "`zk path -m potatoe`"
#path = "zk list --quiet --format {{path}} --delimiter , $@"

# Returns the Git history for the notes found with the given arguments.
# Note the use of a pipe and the location of $@.
#hist = "zk list --format path --delimiter0 --quiet $@ | xargs -t -0 git log --patch --"

Environment

zk 0.14.0
system: Darwin 22.6.0 arm64

tjex avatar Jan 31 '24 18:01 tjex

@mcDevnagh @julio-lopez @jurica

Would it be a case of manually sanitizing the note titles pre-json output? e.g, in the internal/core/note_format.go file?

I had a look at the go json.Marshal documentation.

It states something about a syntax like below being possible?

	Filename     string                 `json:"filename,string"`

Which should format it appropriately for wrapping in double quotes, but it didn't work on first try.

tjex avatar Jan 31 '24 18:01 tjex

We shouldn't have to sanitize. encoding/json does the sanitation for us. For example,

package main

import "encoding/json"

type Test struct {
	Title string
}

func main() {
	json, err := json.Marshal(Test{
		Title: `"`,
	})

	if err != nil {
		panic(err)
	}
	print(string(json))
}

outputs

{"Title":"\""}

Is json.Marshal outputting everything in zk graph --json? This would be an upstream issue, but I'd find that very surprising. More likely we're calling json.Marshal for each note, and ignoring errors (or simply logging them). If that is the case, the best place to start fixing this would be to look at that error.

mcDevnagh avatar Feb 10 '24 20:02 mcDevnagh

Ok great, thanks for this. As a heads up, I won't be able to look at this (or anything else) until March 17. I'll be on holiday 🏝️

tjex avatar Feb 14 '24 07:02 tjex

note: narrowed this down a little bit and it actually looks like the culprit is osEnv() call in newNoteFormatter.

When it's getting called in graph.go, line 70, something about the double quoted title is breaking it. This is why it spits out the entire shell env variables to stdout. The double quotes in the file name is not the culprit as I'm using ids as file names.

Changing the file title from: what does "this" mean? to what does 'this' mean? executes without error.

tjex avatar Apr 01 '24 01:04 tjex

It's actually due to the quotes not being escaped in link_format.go.

These functions are used to render the links within documents as well. So escaping the double quotes here will render links within notes with the backslashes: [[a linked \"note\"]]. This of course breaks the links.

tjex avatar Apr 01 '24 03:04 tjex

Does that mean that the true culprit is links to titles with quotes and not titles with quotes themselves?

mcDevnagh avatar Apr 05 '24 23:04 mcDevnagh

In other words, quotes are being escaped properly everywhere in json output except the string for the filename's own link.

If I hardcode the problem field output to foo, the call to zk graph --format json returns json without error.

{
  "filename": "another \"test\" note.md",
  "filenameStem": "another \"test\" note",
  "path": "another \"test\" note.md",
  "absPath": "/Users/tjex/.local/src/zk-org/workbench/test-vault/another \"test\" note.md",
  "title": "another \"test\" note",
  "link": "foo",
  "lead": "",
  "body": "",
  "snippets": [],
  "rawContent": "# another \"test\" note\n\n\n",
  "wordCount": 4,
  "tags": [],
  "metadata": {},
  "created": "2024-04-01T03:11:03.540446352Z",
  "modified": "2024-04-01T03:11:05.217630323Z",
  "checksum": "a121f8d06da1bff78c5e5eb4816d0417ab2ec22317bad43baedb51a20c60df68"
}

Otherwise it would in effect return this invalid json:

{
  "filename": "another \"test\" note.md",
  "filenameStem": "another \"test\" note",
  "path": "another \"test\" note.md",
  "absPath": "/Users/tjex/.local/src/zk-org/workbench/test-vault/another \"test\" note.md",
  "title": "another \"test\" note",
+  "link": "[[another "test" note]]", <- the quotes around test are not escaped
  "lead": "",
  "body": "",
  "snippets": [],
  "rawContent": "# another \"test\" note\n\n\n",
  "wordCount": 4,
  "tags": [],
  "metadata": {},
  "created": "2024-04-01T03:11:03.540446352Z",
  "modified": "2024-04-01T03:11:05.217630323Z",
  "checksum": "a121f8d06da1bff78c5e5eb4816d0417ab2ec22317bad43baedb51a20c60df68"
}

With the initial idea for the fix in #400 , the output looks like this:

{
  "filename": "another \"test\" note.md",
  "filenameStem": "another \"test\" note",
  "path": "another \"test\" note.md",
  "absPath": "/Users/tjex/.local/src/zk-org/workbench/test-vault/another \"test\" note.md",
  "title": "another \"test\" note",
  "link": "[[another \"test\" note]]",
  "lead": "",
  "body": "",
  "snippets": [],
  "rawContent": "# another \"test\" note\n\n\n",
  "wordCount": 4,
  "tags": [],
  "metadata": {},
  "created": "2024-04-01T03:11:03.540446352Z",
  "modified": "2024-04-01T03:11:05.217630323Z",
  "checksum": "a121f8d06da1bff78c5e5eb4816d0417ab2ec22317bad43baedb51a20c60df68"
}

Links are still rendered correctly within documents:

# note title
[another "test" note](another%20%22test%22%20note)
[[another "test" note]]

tjex avatar Apr 06 '24 03:04 tjex

I have a lot of notes with titles that contain quotes and I have recently noticed this issue when I was trying to export notes in JSON format

zk list -Pqf json | jq  

I believe I have identified the source of this problem: #440.

flytaly avatar Aug 07 '24 16:08 flytaly

fixed with #440

tjex avatar Aug 17 '24 15:08 tjex