cogai icon indicating copy to clipboard operation
cogai copied to clipboard

Chunks syntax: characters allowed for types, names, and ids

Open tidoust opened this issue 4 years ago • 2 comments

The chunk.js implementation suggests that names are composed of letters and digits, as well as a restricted set of punctuation characters.

However, the description of @rdfmap suggests that chunk property values could be IRIs:

@rdfmap {
  dog http://example.com/ns/dog
  cat http://example.com/ns/cat
}

In practice, I wonder what are allowed characters for types, names, and ids. It seems to me that allowing IRIs (as done in JSON-LD) could also help mapping with the semantic world, and that it would allow reasoning about things. For instance, I could have

website https://example.org/ {
  name "An example page"
}

One problem is that commas are allowed in IRIs, which makes them problematic for use in a comma separated list of property values. A solution is to simply use space as a separator between values, or to mandate excaping of commas in IRIs.

tidoust avatar Jun 04 '20 07:06 tidoust

The JavaScript implementation currently uses the following regular expressions:

number: /^[-+]?[0-9]+.?[0-9]*([eE][-+]?[0-9]+)?$/ name: /^(*|(@)?[\w|\d|\.|_|-|\/|:]+)$/ iso8061: /^\d{4}(-\d\d(-\d\d(T\d\d:\d\d(:\d\d)?(.\d+)?(([+-]\d\d:\d\d)|Z)?)?)?)?$/

Chunk identifiers are names, so your example with a URL for a chunk ID is fine.

Commas are really convenient for list item separators, so to allow IRIs, any commas within them should be escaped.

draggett avatar Jun 04 '20 07:06 draggett

I guess we can start with a restricted set of characters and open things up later on.

FWIW, \w is equivalent to [A-Za-z0-9_] and thus already includes \d and _.

tidoust avatar Jun 04 '20 10:06 tidoust