zotero-markdb-connect icon indicating copy to clipboard operation
zotero-markdb-connect copied to clipboard

Citekey match fails with escape characters in filename

Open gdinh opened this issue 1 year ago • 19 comments

Currently, you can specify a custom file filter, and the first capturing group will be taken as the BBT cite key. This fails when the filename and the YAML header title do not match, e.g. when there are escape characters in the filename.

For example, let's say I want to group all of my references under logseq namespaces, so I tell logseq to prefix imported citkeys with the string ref/. As the slash is escaped, these files are stored as refs%2FDoe23-UnderwaterBasketWeaving.md on the disk, but the title:: field inside the file's YAML header is title:: refs/Doe23-UnderwaterBasketWeaving.

This mismatch appears to cause MarkDB to choke. Using the file filter ^refs%2F.+\.md$ will cause the following errors upon a tag sync:

  • There were 1 Markdown notes that could not be parsed.
  • There were 1 citekeys in your Markdown notes that could not be matched to items in your Zotero library.
  • There was an issue matching some of your Markdown notes (0 notes were matched successfully).

The error json produced by all of these is:

[
 {
  "citekey": null,
  "citekey_metadata": null,
  "citekey_title": null,
  "zotkeys": [],
  "zotids": [],
  "name": "refs%2FDoe23-UnderwaterBasketWeaving",
  "path": "/Users/grace/Library/Mobile Documents/iCloud~com~logseq~logseq/Documents/pages/refs%2FDoe23-UnderwaterBasketWeaving.md"
 }
]

Using the file filter ^refs/.+\.md$ will simply not match files.

gdinh avatar Jun 08 '23 00:06 gdinh

If you're using characters in the BBT citekey that are incompatible with the filesystem, the citekey will need to come from the yaml header. The filename and yaml header do not have to match.

Currently the yaml keyword is only configured for metadata following the https://help.obsidian.md/Editing+and+formatting/Metadata specification:

---
citekey: bbtcitekeyval
field2: val2
---

Do your notes have only one title:: property or multiple? I could look at updating the metadata syntax based on whether a logseq graph name is specified in the prefs

daeh avatar Jun 08 '23 06:06 daeh

If you're using characters in the BBT citekey that are incompatible with the filesystem, the citekey will need to come from the yaml header.

To clarify: the BBT citkey is not the one that contains incompatible characters - the incompatible characters are coming from the prefix added to the citekey to form the page title, which must contain a slash to make logseq namespaces work.

The note only has a single title:: property. This is what the header looks like:

tags:: [[Cardinality Estimation]], [[Information Theory]], [[Polymatroid Bound]], [[Worst-case Optimal Join]], [[ref]]
  date:: 2022
  publisher:: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  place:: "Dagstuhl, Germany"
  series:: Leibniz International Proceedings in Informatics (LIPIcs)
  proceedings-title:: 25th International Conference on Database Theory (ICDT 2022)
  isbn:: 978-3-95977-223-5
  doi:: 10.4230/LIPIcs.ICDT.2022.1
  title:: refs/Ngo22-Information
  pages:: 1:1–1:21
  volume:: 220
  item-type:: [[conferencePaper]]
  access-date:: 2023-06-07T20:52:26Z
  original-title:: On an Information Theoretic Approach to Cardinality Estimation
  url:: https://drops.dagstuhl.de/opus/volltexte/2022/15875
  authors:: [[Hung Q. Ngo]]
  library-catalog:: Dagstuhl Research Online Publication Server
  links:: [Local library](zotero://select/library/items/BQYSFG3T), [Web library](https://www.zotero.org/users/USERID/items/BQYSFG3T)

Unfortunately, the built-in zotero integration in logseq does not seem to add citekeys to the YAML, and there appears to be no setting to do so:

image

gdinh avatar Jun 08 '23 16:06 gdinh

Thanks for the info. I'll look at revamping the plugin's support for logseq's zotero integration.

Is the title:: always {prefix}/{BBTkey} (in general, not just in your case)?

Is it always structured like

tags:: ...
  ...
  title:: ...

?

If so, I could add a user setting where you can specify the "prefix" and configure the metadata parsing to handle logseq's data structure.

daeh avatar Jun 09 '23 18:06 daeh

The title is actually {prefix}{BBTkey}; in this case prefix itself includes the slash (see the third textfield from bottom in the screenshot above) in order to enable namespace support.

I believe the header is always structured in this way; here's the header for another paper:

tags:: [[ref]]
date:: [[Feb 1st, 2018]]
issn:: 1420-8970
issue:: 1
extra:: Citation Key: GGOW18-Algorithmic
doi:: 10.1007/s00039-018-0434-2
title:: refs/GGOW18-Algorithmic
pages:: 100–145
volume:: 28
item-type:: [[journalArticle]]
original-title:: "Algorithmic and optimization aspects of Brascamp-Lieb inequalities, via Operator Scaling"
url:: https://doi.org/10.1007/s00039-018-0434-2
publication-title:: Geometric and Functional Analysis
authors:: [[Ankit Garg]], [[Leonid Gurvits]], [[Rafael Oliveira]], [[Avi Wigderson]]
links:: [Local library](zotero://select/library/items/99TZM9GU), [Web library](https://www.zotero.org/users/xxxxxxxx/items/99TZM9GU)

gdinh avatar Jun 11 '23 03:06 gdinh

Hi @gdinh — I was just looking at this again and trying to remember what all the issues were. I'm revamping the logseq support, but while that's in process, I wanted to check if this workaround solves this current issue in the short term.

Does it work for you to just specify a capturing group in the file filter regex?

Like you said,

Using the file filter ^refs/.+\.md$ will simply not match files.

If you use ^refs%2F(.+)\.md$ instead, then the first capturing group will be used as the BBT citekey. In that case, you could just leave the yaml keyword blank.

Let me know if that's a workaround.

daeh avatar Jul 17 '23 17:07 daeh

Sorry about totally missing this message!

It almost works! All the papers linked in Logseq are properly detected by using this filter.

Unfortunately, the "open markdown note" function doesn't work. Instead of opening the page "refs/KZA+23-AuRORA", it opens the page "refs%2FKZA+23-AuRORA". It seems like the / character cannot be encoded in the logseq url; the URL needed to make this work is "logseq://graph/documents?page=refs/KZA%2B23-AuRORA".

gdinh avatar Oct 09 '23 21:10 gdinh

@gdinh I added some more logseq config to the zot7 version:

https://github.com/daeh/zotero-markdb-connect/releases/tag/v0.1.0-rc.2

I had a hard time figuring out how to resolve this because on my system, logseq requires refs/to be URL encoded in the URI. So I'm not sure if logseq changed how they were handling URI (they should definitely be escaping /, so I'm guessing it was an update on their end).

Whatever the case, there's more config options in this version. So if this is still relevant, please give it another go with any version >= 0.1.0-rc.1.

daeh avatar Dec 09 '23 23:12 daeh

Closing as resolved. Feel free to reopen if it's not.

daeh avatar Mar 29 '24 01:03 daeh

Although I have much less technical knowledge than the original poster, I think I'm encountering a similar problem. In Logseq, I'll have a notes page with this title: '@Missing out: in praise of the unlived life'. But in the Zotero 7 beta, when I choose for this reference "Open Note in Logseq", the generated title to be searched is '@Missing out%3A in praise of the unlived life' and thus the file isn't found.

What can I do?

odysseus90210 avatar Jun 22 '24 02:06 odysseus90210

this isn't a hard tweak to make, but I'll have to sort out some dependency conflicts. Will try to get to it this week.

daeh avatar Jun 22 '24 04:06 daeh

Actually, it turns out that Zotero is putting the "%3A" in the file name, so I don't understand why "Open Note in Logseq" isn't working.

odysseus90210 avatar Jun 22 '24 04:06 odysseus90210

What's the URI that should open the file? The URI takes the form logseq://graph/{{graph}}?page=...

You can get it from logseq using the Copy page URL for the page.

(And then tell me what URI "Open Note in Logseq" is generating?)

daeh avatar Jun 22 '24 04:06 daeh

Sorry for the delay. The actual URL of the page in Logseq is:

logseq://graph/chremata?page=%40The%20Social%20Function%20of%20Attic%20Tragedy%3A%20A%20Response%20to%20Jasper%20Griffin

The URL that "Open Note in Logseq" is generating is:

logseq://graph/chremata?page=%40The%20Social%20Function%20of%20Attic%20Tragedy%253A%20A%20Response%20to%20Jasper%20Griffin

odysseus90210 avatar Jun 25 '24 02:06 odysseus90210

thanks. the : is getting double url encoded. I have a fix that I'll push tomorrow.

daeh avatar Jun 25 '24 04:06 daeh

Thank you!

odysseus90210 avatar Jun 25 '24 04:06 odysseus90210

Let me know if https://github.com/daeh/zotero-markdb-connect/releases/tag/v0.1.1-beta.1 works.

daeh avatar Jun 25 '24 14:06 daeh

Sorry -- how do I install this if not through the update mechanism? I could first uninstall version 0.1, but I'm afraid of losing settings and wreaking havoc!

odysseus90210 avatar Jun 26 '24 07:06 odysseus90210

your setting should be preserved. just remove the installed version and add the beta xpi

daeh avatar Jun 26 '24 20:06 daeh

did you check if the beta works for you, @odysseus90210 ? Again, all the MDBC settings will persist when you uninstall v0.1.0 and install the beta.

daeh avatar Jun 30 '24 18:06 daeh

I went ahead and released v0.1.1 with these changes. You can get it by triggering plugin updates from Zotero's Tools > Plugins, or by downloading the xpi from the latest release. Please open a new issues if you run into problems.

daeh avatar Jul 12 '24 17:07 daeh