schema icon indicating copy to clipboard operation
schema copied to clipboard

Style inheritance and dependent styles.

Open JohnLukeBentley opened this issue 7 years ago • 44 comments

I'm new to CSL and as I understand it there is no style inheritance mechanism. That is, the ability to have a smaller csl file with a few modifications to citation and bibliography styles that are contained in a base csl file.

There are "dependent styles", http://docs.citationstyles.org/en/stable/specification.html#dependent-styles ..

A dependent style is an alias for an independent style. Its contents are limited to style metadata, and doesn’t include any formatting instructions (the sole exception is that dependent styles can specify an overriding style locale).

... but that just aliases the parent style and there's not room for providing overriding styles.

From https://github.com/citation-style-language/styles/wiki/Style-Requirements

If you started from another CSL style, delete the original style authors and contributors, and point to the original style with a "template" link:

<info>
  <link href="http://www.zotero.org/styles/original-style" rel="template"/>
</info>

... that suggests rel="template" serves as a mere informational reference.

If I'm right there's no style inheritance mechanism would there be appetite for this as a feature request?

JohnLukeBentley avatar Mar 08 '18 00:03 JohnLukeBentley

You're correct.

As for appetite, my own worry is that could be awfully complicated feature to implement.

bdarcus avatar Mar 08 '18 01:03 bdarcus

Thanks for confirming the lack of an inheritance mechanism. Yes I suspect it would be labourious to implement.

A few (perhaps obvious) comments on why it might be worth the labour ...

Inheritance, does seem like a common pattern that mature languages get to. Thinking of subclassing in, say, Java; or cascading in Cascading Style Sheets.

In the CSL context I find myself taking existing, well developed, styles (e.g. Chicago Manual of Style 17th edition (author-date)) and making small tweaks (e.g. including the URL in a citation). Ideally I'd like to have my small-tweaked-style automatically inherit all the benefits of further improvements to the base style - as they are made and without my having to touch the code (excepting if the base update breaks my tweaks).

JohnLukeBentley avatar Mar 08 '18 02:03 JohnLukeBentley

Do you have some idea of how it would work in terms of the CSL syntax, etc.?

bdarcus avatar Mar 10 '18 00:03 bdarcus

Good question.

I suppose two possible approaches could be taken in terms of how child style's nodes override their parent style.

The first approach is just to override the citation or bibliography wholesale if a citation or bibliography exists in the child style.

For example if child-style.csl is something like the follwing ....

<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" class="in-text" version="1.0" 
    demote-non-dropping-particle="display-and-sort" page-range-format="chicago">
  <info>
    <title>Child Style/title>
    <id>http://www.zotero.org/styles/child-style</id>
    <link href="http://www.zotero.org/styles/child-style" rel="self"/>
    <link href="chicago-author-date.csl" rel="parent"/>
    <!-- Other info nodes -->
  </info>
  <citation>
      <!-- Child style's citation code -->
  </citation>
</style>

... then this child style would inherent macro's and the bibliography from its parent (chicago-author-date.csl) but redefine the citation. That is, whatever is between <citation> and </citation> in child-style.csl overrides whatever is in between <citation> and </citation> in the parent.

That model, though, would be of limited benefit. You wouldn't get, in our example, whatever subsequent improvements occurred in the parent style to citations.

The more ambitious model would enable the child style to insert and delete nodes in specific parts of the parent's citation or bibliography.

From example from the citation node from chicago-author-date.csl looks like this ...

<citation et-al-min="4" et-al-use-first="1" disambiguate-add-year-suffix="true" 
   disambiguate-add-names="true" disambiguate-add-givenname="true" 
   givenname-disambiguation-rule="primary-name" collapse="year">
  <layout prefix="(" suffix=")" delimiter="; ">
    <group delimiter=", ">
      <choose>
        <if variable="issued accessed" match="any">
          <group delimiter=" ">
            <text macro="contributors-short"/>
            <text macro="date-in-text"/>
          </group>
        </if>
        <!---comma before forthcoming and n.d.-->
        <else>
          <group delimiter=", ">
            <text macro="contributors-short"/>
            <text macro="date-in-text"/>
          </group>
        </else>
      </choose>
      <text macro="point-locators"/>
    </group>
  </layout>
</citation>

Let's say I want my child style to insert the following ...

    <text macro="titleshort-with-title-fallback" font-style="italic" />
    <text variable="URL" />

... just before the <text macro="point-locators"/> node in chicago-author-date.csl then the challenge is to come up with a syntax and mechanism to do this. Perhaps XPath or CSS selectors could be used. E.g. In our child-style.csl, using XPath ...

<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" class="in-text" version="1.0" 
  demote-non-dropping-particle="display-and-sort" page-range-format="chicago">
  <info>
    <title>Child Style/title>
    <id>http://www.zotero.org/styles/child-style</id>
    <link href="http://www.zotero.org/styles/child-style" rel="self"/>
    <link href="chicago-author-date.csl" rel="parent"/>
    <!-- Other info children -->
  </info>
  <citation action="insert-before" xpath="citation/*/text[@macro='point-locators']">
      <!-- Nodes to insert -->
     <text macro="titleshort-with-title-fallback" font-style="italic" />
     <text variable="URL" />
  </citation>
</style>

JohnLukeBentley avatar Mar 11 '18 11:03 JohnLukeBentley

Marking this as deferred for now. I think the idea is interesting and ambitious, but I don't really know how a good solution could look like.

denismaier avatar Jun 16 '20 20:06 denismaier

I will add that part of the impetus for this request is to reduce labor for a style author or editor (something I fully support), at the expense of styles no longer self-contained.

But another way to solve the labor problem is what I've long advocated: #244.

bdarcus avatar Jun 16 '20 20:06 bdarcus

Apologies for only now replying to this!

... then this child style would inherent macro's and the bibliography from its parent (chicago-author-date.csl) but redefine the citation. That is, whatever is between <citation> and </citation> in child-style.csl overrides whatever is in between <citation> and </citation> in the parent.

This might have promise, but what about a first step, which is simply to load the macros for reuse?

The only complication there would be potentially clashing macro names. But maybe that's not an issue really? And I suggest a solution below.

But notwithstanding the other question, which is how clients would deal with this, it's a simple solution (would be a trivial change to schema and spec), that would likely get 90% of the benefit?

To use your example:

  <info>
    <title>Child Style/title>
    <id>http://www.zotero.org/styles/child-style</id>
    <link href="http://www.zotero.org/styles/child-style" rel="self"/>
    <!-- 
       The below would load the macros from chicago; perhaps we could add a prefix 
       attribute to avoid clashes? 
    -->
    <link href="chicago-author-date.csl" prefix="ch" rel="macro-source"/>
  </info>

.... and then you could call the macro like so:

<text macro="cs::title"/>

bdarcus avatar Jun 16 '20 20:06 bdarcus

But another way to solve the labor problem is what I've long advocated: #244.

MakeCSl is certainly useful, but I think it solves a different problem, i.e. quickly building styles. But those will probably still need some tweaking. It's my impression that this request here deals with the tweaking part.

denismaier avatar Jun 16 '20 21:06 denismaier

This might have promise, but what about a first step, which is simply to load the macros for reuse?

Maybe a start, but does that really help? You'd still have to assemble all the logic yourselve.

denismaier avatar Jun 16 '20 21:06 denismaier

It's just based on the observation that, with most of the canonical styles, like maybe 70-80% of the code is macros.

I could be wrong though in terms of how this would actually work. I have not written a style in a long time ;-)

bdarcus avatar Jun 16 '20 21:06 bdarcus

It's just based on the observation that, with most of the canonical styles, like maybe 70-80% of the code is macros.

That's right. But if you, say, just want to remove the quotes from titles, you will not want to reassemble the bibliography but just override one specific macro.

denismaier avatar Jun 17 '20 11:06 denismaier

I understand. But that seems a much bigger task than what I floated above, and would be dependent on it anyway.

I agree with your labeling it for a future decision, though, since it's complicated, particularly if consensus is the override behavior is necessary.

Curious what @JohnLukeBentley thinks as well, if he's still around.

bdarcus avatar Jun 17 '20 11:06 bdarcus

Perhaps another option: one could use some sort of preprocessor or template engine for this instead of dealing with inheritance inside CSL. Running a script or the preprocessor would create a new style based on some settings. Maybe XSLT?

Like:

create-new-style --from chicago-note-bibliography --to my-tweaked-chicago --include my-tweaks

If the parent style changes, you'd just re-create your style, and pull in the changes.

denismaier avatar Jun 17 '20 21:06 denismaier

Looks like a tool like that already exist: xmldiff and xmlpatch: https://tools.ietf.org/html/rfc5261#section-4.4 Don't know what the status of that is.

Python version/implementation Or just a tool with the same name: https://pypi.org/project/xmldiff/

denismaier avatar Jun 17 '20 21:06 denismaier

Ok, did a test with https://pypi.org/project/xmldiff/ Looks quite promising.

Scenario: You like the style modern-language-association.csl except you want citations in notes, rather than in-text.

  1. Make a copy of modern-language-association.csl, rename it (I used dm-modern-language-association.csl)
  2. Make the necesary changes.
  3. Run xmldiff, e.g. with xmldiff modern-language-association.csl dm-modern-language-association.csl > dm-modern-language-association.diff

This gives you something like this:

[update-attribute, /*[1], class, "note"]
[update-text, /*/*[1]/*[1], "DM Modern Language Association 8th edition"]
[update-text, /*/*[1]/*[2], "DM-MLA"]
[update-text, /*/*[1]/*[3], "http://www.zotero.org/styles/dm-modern-language-association"]
[update-attribute, /*/*[1]/*[4], href, "http://www.zotero.org/styles/dm-modern-language-association"]
[update-attribute, /*/*[19]/*[1], suffix, "."]
[delete-attribute, /*/*[19]/*[1], prefix]

Now, whenever modern-language-association.csl is updated, just do xmlpatch dm-modern-language-association.diff modern-language-association.csl > dm-modern-language-association.csl.

Of course, this does not guarantees the new patched version will continue to work properly.

denismaier avatar Jun 18 '20 12:06 denismaier

@denismaier, and @bdarcus those are helpful comments. Thanks.

@bdarcus wrote ...

I have [not] written a style in a long time ;-)

Yeah me too. And a long time since I was immersed in the issues that motivated this thread. So as my head is being roused to the topic, my first impressions are ...

On MakeCsl @denismaier seems spot on with

MakeCSl is certainly useful, but I think it solves a different problem, i.e. quickly building styles. But those will probably still need some tweaking. It's my impression that this request here deals with the tweaking part.

My request here does deals with the tweaking part.

On merely overriding macros that won't help with the kind of detailed tweaks I have exemplified (e.g. where you want to add a url to a citation) or as with @denismaier's example "you, say, just want to remove the quotes from titles". I note you offered it as "a first step" but I think this would just end up as a half-hearted aim that doesn't properly match the goal of the (my) suggestion. But I hasten to add your suggestion, merely overriding macros, is nevertheless worthy as something to have kicked around in this creative phase.

@denismaier I might have judged your pre-processing idea similarly. That is and because really what would be ultimately useful is to have the inheritance mechanism incorporated into the CSL engine for time saving reasons.

However, I think the pre-processing suggestion fruitful for the following. We (leaving to the side who "we" is) can build a pre-processor as a proof of concept. If an CSL inheritance mechanism is a worthy idea, and can be made to work well, then the same logic and syntax should be identical whether it is pre-processed or incorporated into the CSL engine.

Another benefit of building a pre-processor as a proof of concept is that that effort can be done without corrupting the CSL engine's code base. I mean although git, as with any source code control, allows one to code up ideas without corrupting the master branch ... there's something less intrusive about coding an entirely independent pre-processor (or perhaps because I can't be more specific about what this "something less intrusive" is, I'm just wrong on this point).

A third benefit for a pre-processor is that if there emerged some objection for folding in the inheritance mechanism into the CSL engine then the pre-processor would have whatever independent utility it acquired.

By "pre-processor" I take it you initially had in mind one that satisfies the initial suggestion. That is, from a parent style and a child style a result style is produced, after having been run through the pre-processor, that encapsulates the kinds of "tweaks" we have exemplified. We could call that "The Pre-Processor Proper".

The xmldiff tool you must lately pointed to, https://pypi.org/project/xmldiff/, is, of course, a pre-processor. I think this is also quite a clever idea as a sort of poor-man's pre-processor for our purposes, while we don't have a Pre-Processor Proper.

I mean the suggestion seems obvious once said. But like many clever ideas its obviousness occurs only in retrospect.

One could even use one's favourite line-level text based diff tool. I like WinMerge. Especially given CSL files are generally authored so that tags appear on one line each (or can be readily formatted to do so).

Of course, this wouldn't be a fine grained as an xmldiff tool (which you've shown to be able to target attributes, and text). So the xmldiff tool could well be the one to choose over any mere line-level text based diff tool. On the other hand xmldiff, unlike something like WinMerge, doesn't have a UI. And one might benefit from having a UI to manually verify the merge, change by change.

this does not guarantees the new patched version will continue to work properly.

Yes I think this is going to be the case not only for a diff tool pre-processor but any CSL inheritance mechanism (whether "The Pre-Processor Proper" or folded into the engine). There's always a chance a changed parent style will break the child style.

This is a general problem, wherever inheritance occurs. In principle, with object oriented programming languages, a developer of a parent Mammal class is meant to "contract" with developer's who might create children Dog or Cat classes to keep the parent interface stable. In practice developers of the child classes need to constantly test for breaks when the parent is updated.

In our case it might be that we don't even want to impose some theoretical "contract" on the developer of parent/base CSL styles. We might want to preserve the freedom of developers of parent/base CSL styles to change as much as they want. In which case even if we implement a properly functioning CSL inheritance mechanism (and wherever it is located) it might well be that child authors will always have to maintain a vigil for breaks to their style.

In any case, the current shape of my suggestion is that:

  • The functional goal is to have a CSL inheritance mechanism that is able to perform fine grained tweaks. For example, add a URL to a citation or remove quotes from a title. (As opposed to something less ambitious like merely override macros).
  • The ultimate location for the operation is to have the CSL inheritance mechanism folded into the CSL engine itself, rather than be ~~served~~ taken care of by a pre-processor.
  • For anyone having immediate needs for crude CSL inheritance could avail themselves right now of a diff tool. Either https://pypi.org/project/xmldiff/ or a line-based text diff tool (with a UI) like winmerge.
  • For the moment the most fruitful course of action will be to create a proof of concept "Pre-Processor Proper" that meets the functional goal. This can be used to test whether "The ultimate location for the operation" is wisely to be "folded into the CSL engine itself".

On the issue of who is to do this. I'm not yet putting my hand up to do this given current time constraints. Nor would I be expecting anyone to do this. As for all open source it's just a matter of who has the time and enthusiasm for it. So in the absence of any one or more putting their hand up to implement it perhaps the "deferred" tag remains right.

But putting to the side the willingness of anyone to implement the suggestion, does that all move the baton forward? Are there any further lateral ideas?

Edit 01: It doesn't directly make a ~~different~~ difference to our purposes but I was wrong to characterize WinMerge operating merely at the line level. It operates at the character level.

Edit 02: (With irony) edit to Edit01, so marked.

Edit03: "it's" -> "its".

Edit04: "served" to "taken care of". To remove the confusing impression that I might have been referencing a network server.

JohnLukeBentley avatar Jun 21 '20 02:06 JohnLukeBentley

Thanks for this well thought-out comment. Just one small comment: the advantage if pre-processors is that you have to actively choose which changes get incorporated, resolve conflicts and so on. That might be difficult if that is done internally by the CSL processor.

denismaier avatar Jun 21 '20 08:06 denismaier

Cheers.

Yes being forced to actively choose changes might count as an advantage. Among the two diff pre-processors certainly WinMerge allows for this active choice through the UI. And your example output, dm-modern-language-association.diff, from xmldif` in effect shows the author every change that is applied (and so it would be a matter of reading that output).

But, on the other hand, having to actively choose changes might count as a disadvantage. Certainly, and for example, when I'm coding up my child CSS I want the browser to produce, with the parent CSS, a merged result without my having to step through where the child is overriding the parent.

Although, mind you, the browser dev tools help me analyse exactly which style rules, from which sheet, is being applied to a html element.

So I envision CSL inheritance, whether implemented as our Proper Pre-Processor or internally to the CSL engine, working analogously to CSS in the browser. That is, as something where you wouldn't want to have to accept each change.

Although perhaps the analogy does throw up this idea that a CSL dev tool, that allows one to see how the inheritance is applied, might be useful. Perhaps necessary.

All of that is just a bit of a swirl of thought. That is, there's no conclusion intended.

JohnLukeBentley avatar Jun 21 '20 09:06 JohnLukeBentley

Just to update on this, we tagged this "deferred," which I think in the end means we would only implement this if some large project (say citeproc-js or citeproc-rs) were to champion this, and figure out the details of how it would work.

cc @cormacrelf, who has previously discussed similar goals here, with strong objections on his proposal from key constituents.

bdarcus avatar Aug 09 '20 14:08 bdarcus

@JohnLukeBentley - I really like this idea - to me it seems perfectly logical, and the current data structure of thousands of 'dependent' styles that are either just an alias or completely independent with no inbetween is not viable long term. This is definitely a necessary feature, for example, to implement an institution or journal's specific referencing rules, which are a modified version of say Harvard, Chicago, APA, or IEEE.

I think any implementation should be focused purely on modification, rather than trying to copy the structure + formatting approach of HTML + CSS.

The way I see it, there are 3 distinct aspects to this that need to be considered/user input provided:

  1. The action to be taken: substitution/addition, or subtraction.
  2. Where in the document it should occur.
  3. Data being modified: variable, document structure or formatting.

Changes would then be applied in a linear fashion such that the last dependent file has highest priority and overrides everything above it (somewhat similar to localization). @denismaier By simply making the lowest dependent style override the next one above it, conflicts are minimized. If there's a variable clash the style validator should throw an error - just like a compiler would if identical variables are declared both locally and globally in C (and hence the common use of #ifndef). I don't think it's too much to ask to leave it up to the style creator to ensure there are no variable clashes if they're assisted by a validator.

Implementing the above:

  1. Could be defined with special tags, eg. <override> or <deletion>
  2. Could be defined by reproducing the DOM structure and performing a match - multiple matches mean multiple actions, and the modification would occur on all matching sub-nodes.
  3. Could be defined by partially reproducing the tag to be modified.

By way of example, let's take a partial reproduction of IEEE.csl:

<macro name="status">
  <choose>
    <if variable="page issue volume" match="none">
      <text variable="status" text-case="capitalize-first" suffix="" font-weight="bold"/>
    </if>
  </choose>
</macro>

<bibliography entry-spacing="0" second-field-align="flush">
    <layout>
      <!-- Citation Number -->
      <text variable="citation-number" prefix="[" suffix="]"/>

      <!-- Author(s) -->
      <text macro="author" suffix=", "/>

      <!-- Rest of Citation -->
      <choose>
      <!-- Specific Formats -->
        <if type="article-journal">
          <group delimiter=", " suffix=".">
          <text macro="title"/>
          <text variable="container-title" font-style="italic" form="short"/>
          <text macro="locators"/>
          <text macro="page"/>
          <text macro="issued"/>
          <text macro="status"/>
          <text macro="access"/>
        </group>
      </if>
      <else-if type="paper-conference speech" match="any">...
      <else-if type="report">...
      <else-if type="thesis">...
      <else-if type="webpage post-weblog post" match="any">...
      <else-if type="patent">...
      <!-- Online Video -->
      <else-if type="motion_picture">...
      <!-- Generic/Fallback Formats -->
      <else-if type="bill book graphic legal_case legislation motion_picture report song" match="any">...
      <else-if type="article-magazine article-newspaper broadcast interview manuscript map patent personal_communication song speech thesis webpage" match="any">...
      <else-if type="chapter paper-conference" match="any">...
      <else>...
    </choose>
  </layout>
</bibliography>

To perform the following modifications:

  1. Change status to be italic and not bold
  2. Add status to the article-journal type
  3. Add a . suffix to citations
  4. Delete the 'thesis' type
  5. Change the prefix and suffix for the citation number

You could add the following to your dependent style:

<!-- Overrides and additions -->
<override>
  <macro name="status">
    <text font-style="italic"/> <!-- Add italic styling to status -->
  </macro>
  <bibliography>
    <if type="article-journal"> <!-- Note the intentional lack of the <layout> and <choose> tags -->
      <group>
        <text macro="status"/> <!-- Add status field to journal articles -->
      </group>
    </if>
   </bibliography>
  <bibliography>
    <layout  suffix="."/> <!-- add . suffix to citations -->
  </bibliography>
</override>
<override what="tag">
<bibliography>
    <layout>
      <text variable="citation-number" prefix="(" suffix=")"/> <!-- change citation number brackets to parentheses by replacing the entire tag-->
  </layout>
</bibliography>
</override>

<!-- Deletions -->
<delete what="node">
  <else-if type="thesis">  <!-- delete all else-if thesis nodes and all sub nodes -->
</delete>
<delete>
  <macro name="status">
    <text font-weight="bold"/> <!-- remove bold formatting from status -->
  </macro>
</delete>

Such an approach would be fully backwards compatible with existing dependent styles, relatively straightforward to implement, use, and be maintainable by passing through updates to parent styles.

This first idea has the drawback of not being able to differentiate multiple identical tags at the same level in the same node, but it's a resolvable problem.

I think a diff tool is an excellent choice as the basis for an end user dependent style creation tool - the user would have a WYSIWYG GUI to make their changes, then run a diff tool to bundle up the changes into a dependent style file.

ocouch avatar Apr 25 '21 10:04 ocouch

Thanks @ocouch .

But per my last comment, I don't see this happening anytime soon. Not a single implementer has asked for this, and when in the past it has come up, they've been against it.

I will take this opportunity to again mention #244, which I still think is a good idea, and would if done right be likely be much more user friendly than the approach advocated here. Or at least, the UX could conceivably be similar, but without requiring any of the implementation changed required to implement this.

bdarcus avatar Apr 25 '21 10:04 bdarcus

@ocouch on a skim of your post I'm gladdened you should: see the benefits so clearly; and provide an example of a plausible inheritance semantics.

And noting @bdarcus comments about the general lack of enthusiasm for it I'll emphasize what I've previously written off the back of @denismaier's pre-processing idea ...

I think the pre-processing suggestion fruitful for the following. We (leaving to the side who "we" is) can build a pre-processor as a proof of concept. If an CSL inheritance mechanism is a worthy idea, and can be made to work well, then the same logic and syntax should be identical whether it is pre-processed or incorporated into the CSL engine.

I mean it seems perfectly open to us as individuals - e.g. you @ocouch, me, or anyone else visiting this thread, to create a separate project that implements CSL inheritance using a pre-processor (using whatever language/tool set they like).

(However, for myself, I haven't the time for this at the moment).

JohnLukeBentley avatar Apr 25 '21 11:04 JohnLukeBentley

@bdarcus #244 looks great for generating a new style from scratch, but this is about having a way to make minor modifications to styles - for which the differences may be small, and sufficient, high quality training data is difficult to obtain or impractical to produce without something like this suggestion already existing. It's also about maintainability - how do you go about updating every APA related style if they're all generated independently through AI?

@JohnLukeBentley My struggle is both time and the learning curve - of learning RELAX NG, Javascript, and then all the project specific details about how to implement changes. I'm no software engineer, and I know little about JS and DOM tbh - being context dependent, I find the code very difficult to understand as it requires knowing when and where the code will be run as to what the values will be but there's not an obvious link as there is when passing arguments/pointers.

ocouch avatar Apr 25 '21 12:04 ocouch

... this is about having a way to make minor modifications to styles - for which the differences may be small ...

I am aware, but given how many styles are now available, a related issue it could address is finding a style to do what one wants is already available.

E.g. just because you think you just need to edit one thing on an existing styles doesn't mean there isn't one that already implements that change. We have no easy way to know that ATM.

My struggle is both time and the learning curve ...

Indeed, that's the issue all around :-)

bdarcus avatar Apr 25 '21 12:04 bdarcus

I don't think it matters if there are functionally identical dependent styles if they are for different use cases. I think it's appropriate to have a unique dependent or independent style for each institution/journal article - even when that is just an alias. While it may be tempting to reuse a style that's the same it would be bad practise, as there's no actual link between them other than coincidence.

The issue this suggestion solves is that we currently need to break the technical link between styles, when the policy/philosophical link still exists.

The files should match the policy/philosophy of the journals/insititutions. If there's a well defined parent style, eg. the journal/institution says "We use Harvard referencing style, but here's our list of custom types". Then it definitely makes sense to have their style as a dependent of the Harvard style, with necessary modifications.

ocouch avatar Apr 25 '21 12:04 ocouch

@ocouch.,,,

"I'm no software engineer". Well it sounds like you are software engineer enough to understand this deeply abstract software engineering problem and suggest highly plausible solutions.

Earlier you wrote "relatively straightforward to implement". And in the sense you intended I agree with you: that it appears relatively straightforward as a conceptual matter.

If I was doing it, I might use Java (just because I've lately got my head in that language) using whatever relevant Java XML library; or XSL (XSLT and XPATH). Although, having spent sometime playing with XSL, I generally don't find XSL intuitive to work with (that is, given the option I'd generally recommend some traditionally procedural language, like Java, C#, C++, Javascript, or whatever the cool kids are using these days).

Moreover I don't think a schema language, whether XML Schema or Relax NG, is directly relevant. A schema language just helps you determine whether a source XML document conforms to whatever custom semantics you stipulate. I mean you might want to code a schema to stipulate the validity of the inheritance semantics but it's not going to be doing the work of transforming two XML documents into a final XML (and CSL) document.

But, yes, if you aren't across any set of plausible tools/languages then it's going to be a bit of struggle. And certainly there'd be no pressure from me, or anyone else, for you to do anything here, beyond your already valuable contribution.

@bdarcus correctly observes that time and skill is "the issue all around :-)".

If there's a well defined parent style, eg. the journal/institution says "We use Harvard referencing style, but here's our list of custom types". Then it definitely makes sense to have their style as a dependent of the Harvard style, with necessary modifications.

Yeah, all that (and your prior post overall) gets at my initial motivations for creating this issue.

JohnLukeBentley avatar Apr 25 '21 12:04 JohnLukeBentley

Journals frequently say things like "use AMA 9th edition, but with these changes". More often than not, the style they say they are based on isn't accurate (eg, https://forums.zotero.org/discussion/88892/ama-9th-edition). Especially with "Harvard" style, there is no uniform "Harvard" style (that just means author-date citations), so what any particular publisher means by that is highly heterogeneous.

I absolutely see the potential benefit of this for style writing. But there are four significant downsides:

  1. None of the existing citeprocs are written to handle inheritance of macros or whole styles with any amount of substitution. Although we could write detailed rules for inheritance to guide how it should work, it would be a massive undertaking on the part of the citeproc authors to actually implement. Especially for the most widely used one, citeproc-js, I don't see this happening.
  2. It introduces a new set of difficulties for style authoring and debugging because the style is no longer self-contained. If I want to change how something appears, do I need to change that in my style or the independent parent style? This would make styles more difficult to read and would likely lead to lots of fragile conditionals as people try to tweak output.
  3. This would be a huge burden on the folks maintaining the style repository. It is already time consuming to review new style submissions or updates. If such review also needs to consider upstream and downstream dependencies, that would become exponentially harder. As an example, if the main Chicago style is updated, should that update also apply to all of its dependencies or no? That would require individual review of each dependency.
  4. Implementing programs like Zotero or Mendeley would need to build a dependency management system. They already have something like this for managing current dependent styles, but that system only needs to import the whole style and change default locales. It would be much more complex and fragile to have to substitute in other bits that are scattered around the whole style file. The developers of these programs have said they are not interested in doing so, and they have also raised concerns about difficulties with user support with such a system.

I absolutely agree with @bdarcus that the feasible way to aid with writing new styles is an easier to use MakeCSL or CSL visual editor program.

bwiernik avatar Apr 25 '21 13:04 bwiernik

Yep. The pushback to any kind of inheritance and dependency management has been convincing. The malleable nature of XML text means that preprocessing instead is certainly an option, but ideally not with the newly suggested diff-based paradigm, which I sense was designed without awareness of existing XML manipulation tools; diffing is not really something for writing by hand as XML, it is for automating with Git etc. (see below), and if you are writing by hand yes XPath is much better. Either way, if you can spit out valid CSL and use it anywhere, go nuts.

But you can (and we should) also address the problem in other ways. There are at least a few more lessons from programmer tooling (other than dependency management) that we can draw from.

  1. Bring git merge to the styles repo. It is difficult for "maintaining a fork" that all the styles live in the same repo. You miss out on all the features of git (mostly automatic merging in of upstream changes) because git treats the forks as different files and can't automatically merge their histories. I think maybe you could fix this with a bit of fancy git scripting, to pretend that the forked style is actually a git branch's version of the original, produce a merge based on their divergent history, and then write it back to the fork's file path with some metadata about the git commit in which they were last considered "merged". That's something for the styles repo maintainers, but I am pretty good at writing that sort of thing so I could build something for it.

  2. Micro test suites inside styles to serve as documentation of forked styles and enable treating them more automatically. I'm not sure but I imagine that the forked styles that do exist don't have particularly good documentation as to how they differ from the original. I also imagine that the forks are maintained slowly by folks in the field who publish to a journal, ie very few people, who can't easily operate git scripts. So the merging has to be more automatic so they can be kept up to date. I think it would be extremely useful to have a micro test suite built into styles, which serves as both documentation and a machine checked test that enables more automation like in point (1). Remember how both Frank and I wrote test runners at the same time, for style development? Just standardise a version of that basically, make a simple CSL-JSON inside XML interface. This is probably something I have to drive by implementing it. I'll see if I can fit that in.

  3. Editing styles is hard. Projects like GUI editors and using machine learning are cool... but they are not exactly low hanging fruit. They are very high development and maintenance effort. There are other reasons than being XML why it is hard to write very easy-to-understand/edit CSL. The language just doesn't give you much in terms of refactoring power. You get macros, just a raw text expansion mechanism and nothing else. There are lots of ways you can make XML styles more approachable. (Not by using JSON! Lol.)

    • Macros could take CSL input, kinda like "React render props". Classic Cormac idea.

    • Deprecate choose, as has been maybe accepted already (?).

    • Allow macros and locale overrides to be after the citation/bibliography so that people are presented with the style's entry point straight up when they open the file.

    • Find a way to give people instant feedback when they're writing it that doesn't require installing VSCode, JRE or JDK, Jing, and some XML schema checker editor plugin, which is something not even I have as yet managed to do. I might have something for this in the attic, I have a gif somewhere.

cormacrelf avatar Apr 25 '21 15:04 cormacrelf

  1. It is difficult for "maintaining a fork" that all the styles live in the same repo. You miss out on all the features of git (mostly automatic merging in of upstream changes) because git treats the forks as different files and can't automatically merge their histories.

This seems particularly important low-hanging fruit. I always imagined when giving style URL ids that development of styles would happen in a distributed manner.

Given everything learned about distributed packages management and such since, across a ton of different development communities (including rust), surely we can do much better, and simultaneously greatly ease the maintenance burden of the styles repo?

bdarcus avatar Apr 25 '21 15:04 bdarcus

For example, Emacs MELPA has a single, typically one-line, "recipe" file for each package, which is simply a redirect to some repo, with optional files specified.

Here's a package I wrote, for example:

https://melpa.org/#/bibtex-actions

That public facing site is built with scripts that pull metadata from the actual source files.

So for sake of comparison, what if going forward the style repo didn't actually hold styles, but instead recipe-like things like this?

source: github
repo: foo/style
file: foo.csl

And major styles could get their own CSL org repos?

bdarcus avatar Apr 25 '21 16:04 bdarcus