specref icon indicating copy to clipboard operation
specref copied to clipboard

Spelling

Open jsoref opened this issue 4 years ago • 20 comments

This PR corrects misspellings identified by the check-spelling action.

The misspellings have been reported at https://github.com/jsoref/specref/commit/011ce84b8fa4b7a0847fc9d9b5c2183e3c9032d7#commitcomment-50050420

The action reports that the changes in this PR would make it happy: https://github.com/jsoref/specref/commit/57d13d6ddabda58a0321ca388186ac8972ef070b

Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.

jsoref avatar Apr 27 '21 09:04 jsoref

Obviously the easiest fixes are the stuff that's in your code (as opposed to the dataset).

I have no idea how your pipeline works, and I'd like to treat it as a black box. If someone can push fixes upstream so that they don't come back, that's best.

I'm still trying to release the version of my tool that I'm using (it's close, but I have two more things I want to fix, one hit in production overnight, and one which is just polish).

jsoref avatar Apr 27 '21 11:04 jsoref

Hi there, any chances of breaking this down into a few different pull requests where we tackle what's not external references first?

tobie avatar May 04 '21 13:05 tobie

Sure. (Sorry, I've been trying to release 0.0.18, which I half did, and then ran into bugs in dependabot which is so much fun.)

jsoref avatar May 04 '21 13:05 jsoref

For reference, the above is just rebasing spelling onto spelling-code -- so, temporarily there may be more commits, but once spelling-code merges, there will be fewer.

jsoref avatar May 04 '21 13:05 jsoref

This perl split the authors:

#!/usr/bin/env perl -pi
if ($a == 1) {
  if (/^\s*\]/) {
    $a = 0; next;
  }
  s/, Ed\./__ED/;
  if (/^(\s+").*,.*"/) {
    $lead = $1;
    s/,\s*(.)/",\n$lead$1/g;
  }
  s/__ED/, Ed./;
} elsif (!$a && /"authors": \[/) {
  $a = 1;
}

n.b. GitHub hates this commit.

jsoref avatar May 04 '21 16:05 jsoref

@tobie / @marcoscaceres: would you want the split authors change standalone? (It comes with Eric Shepherd, because otherwise Shepherd would be split from Eric...)

or the internet archive: identityproject.lse.ac.uk/mary.pdf commit...?

Those are the only two that are easily splittable from this set. I mean, everything else could be split, but I don't think there's any particularly useful split beyond by file, and I doubt that helps much.

jsoref avatar May 05 '21 01:05 jsoref

I'm ok with the size of this change. The changes are pretty straight forward.

@tobie?

marcoscaceres avatar May 05 '21 05:05 marcoscaceres

I'm ok with the size of this change. The changes are pretty straight forward.

Most of these changes will get overwritten in the next 60 minutes by the auto update.

We have to be more intentional if this is to be useful, which is why I suggested starting with just spelling mistakes outside of the data itself.

tobie avatar May 05 '21 05:05 tobie

Most of these changes will get overwritten in the next 60 minutes by the auto update.

hehe, that was next question.... "wait! isn't all this automated?"

We have to be more intentional if this is to be useful, which is why I suggested starting with just spelling mistakes outside of the data itself.

Agree.

marcoscaceres avatar May 05 '21 05:05 marcoscaceres

Yeah, I was worried that most of the content was not primary source, which is why I wasn't particularly eager to do the author split.

jsoref avatar May 05 '21 11:05 jsoref

Ok, so I guess we can salvage the changes to biblio.json and to legacy.json.

marcoscaceres avatar May 10 '21 00:05 marcoscaceres

@jsoref so I'm still interested to merge changes to refs/biblio.json, refs/legacy.json, the readme. the docs, and other non-automated parts of the data set.

Is this something you would be willing to look into?

tobie avatar Jun 06 '21 09:06 tobie

Sure. What do you want me to do?

jsoref avatar Jun 06 '21 09:06 jsoref

thanks for extracting the PRs, @jsoref!

marcoscaceres avatar Jun 07 '21 01:06 marcoscaceres

It's trivial for me to do so, I just need direction :-)

Is there a strategy for the other things? Are there upstreams we can poke? Are there tools we can improve?

jsoref avatar Jun 07 '21 01:06 jsoref

Is there a strategy for the other things?

I guess if you can see if those documents are on GitHub, then filing issues on the misspelled specs will get them fixed there.

Are there upstreams we can poke?

As above... some are it's going to be difficult :( like the IETF ones.

Are there tools we can improve?

Yes... this could run exclusively when a PR is made to refs/biblio.js, for instance. At least, we could check those.

marcoscaceres avatar Jun 07 '21 06:06 marcoscaceres

Yes... this could run exclusively when a PR is made to refs/biblio.js, for instance. At least, we could check those.

I'd be quite happy to offer the action. It's trivial to exclude files (e.g. the three files remaining in this PR) using excludes.txt, or to only check files using only.txt.

jsoref avatar Jun 07 '21 06:06 jsoref

I think (thought please verify!) you can exclude also via the action syntax itself: https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#excluding-paths

marcoscaceres avatar Jun 07 '21 06:06 marcoscaceres

That doesn't do what you imagine. It controls when an action runs, not what it does. And for more fun, it doesn't provide that content to the workflow, so you can't actually use it to do work, you have to re-engineer it :-o

jsoref avatar Jun 07 '21 06:06 jsoref

Fun! thanks @jsoref. I'll take a look at the corresponding PR soon.

marcoscaceres avatar Jun 08 '21 10:06 marcoscaceres