schema icon indicating copy to clipboard operation
schema copied to clipboard

Expanded -short, original-, reviewed- variables

Open bwiernik opened this issue 6 years ago • 107 comments

Recently, in response to user needs for the SBL style, citeproc-js added support for providing -short versions of all CSL variables, to be rendered with form="short". (https://forums.zotero.org/discussion/comment/324592/#Comment_324592)

In my work on apa.csl, I'm finding that it wants a lot more detailed information for reviews (e.g., medium of item being reviewed, date of item being reviewed), as well as for original publication information (e.g., original medium, original container title, original pages, original editor) than is currently possible with existing CSL variables. As far as I am aware, MLA and Chicago have similar requirements.

I suggest that -short, original-, reviewed- should be expanded so that they can be applied to any CSL variable. This would allow maximum flexibility without having to individually specify each possible variable of this kind.

bwiernik avatar Feb 19 '19 23:02 bwiernik

Sounds good to me. Any potential drawbacks when this is implemented?

denismaier avatar Apr 20 '20 14:04 denismaier

These could be implemented in CSL-JSON as arrays for short, original, reviewed. cf. https://github.com/citation-style-language/schema/issues/169

bwiernik avatar May 24 '20 22:05 bwiernik

Hmm. @dstillman Looks Zotero devs are not big fans arrays and objects. Suggestions concerning the data structure here?

What about special affixes that can be used with any other variable? E.g. for -short

On the long run we were talking about a hierarchical data model. At least for reviewed- and original- that would probably the most flexible solution.

denismaier avatar Jun 04 '20 07:06 denismaier

These could be implemented in CSL-JSON as arrays for short, original, reviewed.

I'm not clear what the suggestion is here. Can you give an example?

dstillman avatar Jun 04 '20 09:06 dstillman

@dstillman I should have said objects, not arrays. Example:

"reviewed": {
  "type": "motion_picture",
  "medium": "DVD",
  "title": "Title of reviewed movie"
}

vs listing these as individual reviewed- variables.

reviewed-type: motion_picture
reviewed-medium: DVD
reviewed-title: Title of reviewed movie

bwiernik avatar Jun 04 '20 09:06 bwiernik

So you're wanting to change a mostly flat (aside from contributors and dates) data model to a more structured one.

bdarcus avatar Jun 04 '20 10:06 bdarcus

Ultimately, this could lead to a data model as outlined here.

denismaier avatar Jun 04 '20 10:06 denismaier

I think this is more a discussion for the CSL list, but in general I would strongly advocate for key-value pairs over objects, except where the fields don't make sense independently and the app would need special handling of all associated variables for proper processing anyway. If it's something where there could be a direct mapping between a field and a variable, it's vastly simpler to stick to key-value pairs, and it also allows for hacks like Extra. Reducing implementation complexity is much more important in my view than reducing verbosity in CSL-JSON.

dstillman avatar Jun 04 '20 10:06 dstillman

Yeah, it's easy to add these variable strings ("reviewed-ttile" and such), so let's just do that. We could have defined "container" as an object, for example, but we didn't.

bdarcus avatar Jun 04 '20 10:06 bdarcus

I understand. But what about special handling if prefixes and affixes to variables? Is there a way to define affixes that could be used on other variables? Like allow -short as a general modifing suffix and reviewed- as general modifing prefix? Would that be somehow possible?

denismaier avatar Jun 04 '20 10:06 denismaier

I think that would affect the processor more than the app. If we support a given -short or reviewed- field, the mapping would be hard-coded. It's the processor that would need to know how to handle those.

(I don't totally get it, though. Wouldn't there be nonsensical possibilities? What does issued-short mean?)

dstillman avatar Jun 04 '20 11:06 dstillman

Okay, so let's stick with key-value pairs.

Dan makes a good point on -short. It should apply only to standard variables (string, number, title), not name or date variables.

bwiernik avatar Jun 04 '20 12:06 bwiernik

So 2 questions:

  1. Is it possible to have such prefix/suffix rules in JSON to prevent unnecessary verbosity? (and yes, we will need to restrict -short to certain variables)

  2. If yes, should we do this?

Or should we simply add possible variables to reviewed-, original-, container-, collection- ?

denismaier avatar Jun 04 '20 12:06 denismaier

The three relevant affixes are -short, original- and reviewed-. Could we define valid combinations of these in the style and data schemas using string concatenation?

So, something like variables.short = variables.standard + '-short' and variables.original = 'original-' + variables.all?

bwiernik avatar Jun 04 '20 12:06 bwiernik

The three relevant affixes are -short, original- and reviewed-.

Addendum: With container-, collection- I was not suggesting we should add that now. But perhaps in the medium run?

So, something like variables.short = variables.standard + '-short' and variables.original = 'original-' + variables.all?

That looks good. Would make schema updates easier, wouldn't it? (But I'm a bit pessimistic that will work so easily: https://stackoverflow.com/questions/9708192/use-a-concatenated-dynamic-string-as-javascript-object-key yes that's old, but I perhaps still relevant? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer says it's possible with recent JS, but not with JSON.)

Edit: looks like I misrepresent the problem here. You were not concatenating the key, so it might be easier. (But I don't know.)

denismaier avatar Jun 04 '20 12:06 denismaier

I think container and collection are a much bigger can of worms than the others, so let's set those aside for now.

bwiernik avatar Jun 04 '20 12:06 bwiernik

For the data schema, an option might be to split out the schemas into separate files that match the RNC type and variable structure and use a build script to compile them at commit time.

bwiernik avatar Jun 04 '20 13:06 bwiernik

How would that look like and how would that solve the problem with schema verbosity? Would it?

denismaier avatar Jun 04 '20 13:06 denismaier

Idea: if we have all variables available with original- or reviewed-, what about a mechanism like alternative in csl-m where you can render all variables prefixed with alt- with a single alternative variable? Could make style coding easier.

https://citeproc-js.readthedocs.io/en/latest/csl-m/#id15

denismaier avatar Jun 04 '20 17:06 denismaier

In addition to or instead of making them available as regular variables? It would need to be in addition to if anything. Most styles only want a portion of such information or have different formatting requirements (e.g., APA wants original medium, original type, original title, and original author, not a full reference).

bwiernik avatar Jun 04 '20 18:06 bwiernik

How would that look like and how would that solve the problem with schema verbosity? Would it?

A python script in GitHub Actions could compile the csl-data.json at commit time. It would have a list of all types and variables (separated by category) and dynamically construct the JSON. The main benefit would be ease of maintenance and updating, not needing to manually keep four nearly identical lists aligned manually.

bwiernik avatar Jun 04 '20 18:06 bwiernik

For the data schema, an option might be to split out the schemas into separate files that match the RNC type and variable structure and use a build script to compile them at commit time.

Yes, I was wondering about something like this.

bdarcus avatar Jun 04 '20 18:06 bdarcus

In addition to or instead of making them available as regular variables? It would need to be in addition to if anything.

Sure, in addition to the regular variables. For reviews you will most likely want a full reference, right? (And that reference should also be rendered according to the current style---so giving these details in the regular title is actually not ideal.)

Edit: well, at least Chicago does not request this.

denismaier avatar Jun 04 '20 18:06 denismaier

So, what shall we do about this now? Should I draft a PR for original-, reviewed-, and -short? Or should we go the automated route instead?

denismaier avatar Jun 15 '20 07:06 denismaier

So, what shall we do about this now? Should I draft a PR for original-, reviewed-, and -short? Or should we go the automated route instead?

Depends who's going to write the python script and when.

I have basic python skills, but am not knowledge about parsing text as we need (see comment).

bdarcus avatar Jun 15 '20 10:06 bdarcus

I have basic python skills, but am not knowledge about parsing text as we need

The question is: How will our input look like? Will we just use the json? Or could we even work with native python structures? If so, we don't have to parse anything.

denismaier avatar Jun 15 '20 11:06 denismaier

I don't understand. I was assuming input is the rnc file(s), output is csl-data.json.

What were you thinking? A single, say python, file, whose contents is the data representation, output to both rnc and json?

bdarcus avatar Jun 15 '20 11:06 bdarcus

A was thinking we could use a common source for both rnc and json.


variables = [
    {
        "name" : "title",
        "type" : "string",
        "variants" : ["original-", "reviewed-", "-short"]
    },
    {
        "name" : "author",
        "type" : "name",
        "variants" : ["original-", "reviewed-", "container-]
    },
    ]

def create_rnc(variables):
    # this creates the rnc schema variable list
    return rnc

def create_json(variables):
    # this creates the json schema
    return json

rnc= create_rnc(variables)
json = create_json(variables)

denismaier avatar Jun 15 '20 11:06 denismaier

A single, say python, file, whose contents is the data representation, output to both rnc and json?

Exactly, see above. (Or instead of python, we could also use some other common source that is easy to write and parse, say yaml or toml.

denismaier avatar Jun 15 '20 11:06 denismaier

IC.

I'm agnostic; whatever gets us to the best and easiest result, which is consistent schemas, and clean git histories, including diffs.

I'm not sure on the details of CI in GitHub; how it would work.

On your example, though, maybe better to have separate dicts for datatypes; like "variables-string."

bdarcus avatar Jun 15 '20 11:06 bdarcus