clus-data
clus-data copied to clipboard
Diffs between dpANS3 and CLUS
10:44 <phoe> There is a lot of work to be done that I do not really know how to automate. 10:44 <phoe> The biggest mistake I have made is - I have corrected various minor mistakes in the specification without noting what I have corrected and where. 10:44 <phoe> There is no diff done between the text of dpANS3 and CLUS. 10:44 <phoe> And this is something that needs to be fixed. 10:45 <phoe> The task is to produce, by any means, a list of all differences between the glossaries and dictionary pages of CLUS and dpANS3. 10:46 <phoe> Which is a sizeable and somewhat boring task that requires a lot of concentration or a sufficiently smart approach that can compare the two texts despire their different markup.
10:47 <phoe> At least we do not need to take Examples and Notes into account as they are not a normative part of the specification. They're there purely for illustration and can be changed as we see fit. 10:49 <phoe> The approach is either to do it manually or to somehow automate it. 10:50 <phoe> In theory, we could simply copypaste the text from both the original specification and compare it to CLUS. 10:51 <phoe> Since Ctrl+C has the fascinating trait of stripping all formatting and only preserving text.
10:53 <phoe> My idea is. 10:54 <phoe> Open up a text editor with two buffers. 10:54 <phoe> Copypaste a page from the original dpANS3 into one buffer. 10:54 <phoe> Copypaste a page from CLUS into the other buffer. 10:54 <phoe> Run diff. 10:55 <phoe> Inspect all differences. 10:55 <phoe> There will be garbage that comes from differences in formatting and such, but we will also be able to see the differences this way. 10:55 <phoe> But the diffing process can be automated through unix diff or any other emacslike diff tool. 10:56 <phoe> So the task that I'd say would be first is - find the proper way of dealing with this, create the method. 10:56 <phoe> I'll create a github issue about this and link it to you in a few hours. 10:56 <phoe> Hours, huh, minues. 10:56 <phoe> Once you have any kind of workable method, please post it there - and let's start rocking.
This is current state of what I find. Emacs have ediff-trees https://www.emacswiki.org/emacs/ediff-trees.el, which can take regexp and compare two directorie. I still don't understand this, but in the end that can automated some things, just run it once and feed with regexp.
I try in the week find what ediff-trees can do.
@KZiemian it looks like we can scrape dpANS and CLUS - @rmhsilva is developing an effective way of doing this. Once we have the scraped material, we can use the emacs ediff.
@phoe So what precisely we need to do, because I little lost? Download form their GitHubs tex files and make diff on them?
@KZiemian Please contact @rmhsilva on how he does his scraping and diffing.
Hi @KZiemian. I will generate a set of diffs in the next couple of days, and post here when that's done. Then we can just review the diffs, knowing that they have all been generated somewhat repeatably/consistently.
@rmhsilva Okej, after monday I should find time for that. You have done great work.
I've just added a ton of diff files to https://github.com/phoe/clus-data/tree/master/diffs.
I'm not sure how useful they are in their current form - diff has included quite a few blank lines, despite me trying quite hard to get it to ignore them. However, it's mostly clear where a blank line can be ignored.
The full page of data (including examples) has been included - we can ignore the examples for now, we just need to concentrate on the core text I believe (@phoe, is that right?)
Also, Github's diff viewer is pretty decent!
diff has included quite a few blank lines, despite me trying quite hard to get it to ignore them.
You can remove all lines that contain only +
or -
using some basic text processing.
Yes, we completely ignore the examples - these will need to be manually rewritten and fixed.
Thank you - I'll start looking at these soon.
Hah yeah, of course, I'm not sure why I didn't do that in the first place.
Done now, no more blank lines!
@rmhsilva Thank you. I must still check this GitHub diff, but I make progress.
@phoe I read some files from @rmhsilva, learn some new things on the way, but I don't for what I should watching? There is many rearanging between two version, make true change harder to find, that only thing that I can told right now.
How to mark some file as check? Clon to my account I edit there? I still try get my head over GitHub.
I forked from rmhsilva repository diffs and look at some. I don't see any mistake now, few times unimportant word is missing.
Now I need some more information about what there is to do. And I can mess something up with GitHub.
Cool. I'm not quite sure exactly what needs to be looked for when reviewing the diffs, @phoe?
In order to track which diffs have been checked, I suggest the following:
- we create one more directory, "comments"
- when you have reviewed a diff (e.g. "foobar.diff"), create a file in the comments directory and put any review comments into it
- if there are no significant differences, leave the file empty (e.g. use the unix
touch
utility) - the file name should be the same as the diff name, except with a .txt extension (e.g. "foobar.txt")
We know when we're done when there is a file in the comments directory for every diff. We can also find all non-empty files to check which pages are significantly different.
You can do this process in your own copy of this repository (use the Fork button), and when you are done, you can create a pull request for us to merge. Don't worry about messing up things on Github - we can review all changes before merging them in, and can always revert...
To this moment I did some checking https://github.com/KZiemian/clus-data/tree/master/diffs. Can this be helpful?
@KZiemian Ah awesome, yeah that's helpful! My suggestions were nothing but suggestions, so as you've already started, carry on in that method 👍
I have a problem. In diffs directory file "cl:functions:first_to_tenth.diff" is empty, where can I find orginal versions of filles?
Other empty files. "cl:functions:hash-table.p.diff" "cl:types:restart.diff" "cl:types:satisfies.diff" "cl:types:standard-class.p.diff" "cl:types:standard-object.diff" "cl:types:storage-condition.diff" "cl:functions:setf-table.p.diff" "cl:functions:setf-class-name.diff"
Other empty files. "cl:types:control-error.diff" "cl:types:division-by-zero.diff" "cl:types:floating-point-inexact.diff" "cl:types:floating-point-invalid-operation.diff" "cl:types:floating-point-overflow.diff" "cl:types:floating-point-underflow.diff" "cl:types:generic-function.diff" "cl:types:method-combination.diff" "cl:types:program-error.diff" "cl:types:restart.diff" "cl:types:satisfies.diff" "cl:types:standard-class.diff" "cl:types:standard-object.diff" "cl:types:sotrage-condition.diff"
Hi @KZiemian, thanks for pointing that out. A few things fell through the automated diff process, we'll have to check them manually (by comparing the text in clus with the standard text). I've checked the "First to Tenth" (http://phoe.tymoon.eu/clus/doku.php?id=cl:functions:first_to_tenth) text, and it looks fine.
@rmhsilva In diffs there are often lines like this: @@ -14,23 +26,51 @@ What they mean? Can they replace a large part of identical text? That will be good news.
@KZiemian these are line numbers. See https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html#Detailed-Unified
@rmhsilva @phoe I try to find number of diff that rmhsilva generated few month ago, in sort words my OSs make a mess with names and I must check that noting was lost. I think that should be 972 of them, @rmhsilva can you check that number?
@rmhsilva @phoe I think that I solved problem, that should be 967 diffs.
At least these files don't have good diffs. Diffrent files were compered.
cl:constant_variables:nil cl:constant_variables:t cl:functions:abort cl:functions:atom cl:functions:eql cl:functions:error cl:functions:bit cl:functions:bit-orc1 cl:functions:character cl:functions:complex cl:functions:cons cl:functions:continue cl:functions:eql cl:functions:error cl:functions:float cl:functions:list cl:functions:logical-pathname cl:functions:math-add cl:functions:math-divide cl:functions:math-greater cl:functions:math-less cl:functions:math-multiply cl:functions:math-not-equal cl:functions:math-not-greater cl:functions:math-not-less cl:functions:math-subtract cl:functions:mod cl:functions:muffle-warning cl:functions:not cl:functions:rational cl:functions:pathname cl:functions:rational cl:functions:string cl:functions:values cl:macros:and cl:macros:lambda cl:types:character cl_symbols_lambda cl:types:and cl:restarts:continue cl:restarts:muffle-warning cl:restarts:store-value cl:restarts:use-value cl:types:character cl:types:complex cl:types:cons cl:types:eql cl:types:error cl:types:list cl:types:logical-pathname cl:types:mod cl:types:nil cl:types:not cl:types:null cl:types:pathname cl:types:rational cl:types:values cl:types:vector cl:variables:repl-minus cl:variables:repl-plus cl:variables:repl-slash cl:special_operators:function cl:special_operators:labels (maybe we don't need diff of that) cl:special_operators:macrolet (maybe we don't need diff of that) cl:special_operators:function cl:special_operators:labels cl:special_operators:macrolet
I hardly believe in that, but from 967 diffs is not checked 965. Now I must find which two are missing and most of my current work is done.
In my best knowledge 967 diffs done, I can't go further without help. This don't mean that all works with them is done, I know that more is needed, but I can't do it myself.
State of diff, 24 October 2017.
- On GitHub there are 967 checked diffs, I hope I don't miss any from that generated by @rmhsilva.
- Every colon ":" was removed from that diffs before checking, that may cause problems. Especial in files "cl_macros_something" there as a lot of examples of this. This is because this section use BNF notation with it "::=" and lot of keywords. @phoe and @rmhsilva should decided what we do with that.
- Regardless of how problems with colons ":" ends, 5-10 diffs need be checked again. One of this is "cl_macros_loop", very complicated diff (hard to explain why, easier is just to take a look) which lost many of ":" so I decide to not checking it until above issue is solved.
- Diff sometimes caught different but similar in names files. This files must yet be checked. Here is list of all diffs that I know have this problem.
cl:constant_variables:nil cl:constant_variables:t cl:functions:abort cl:functions:atom cl:functions:eql cl:functions:error cl:functions:bit cl:functions:bit-orc1 cl:functions:character cl:functions:complex cl:functions:cons cl:functions:continue cl:functions:eql cl:functions:error cl:functions:float cl:functions:list cl:functions:logical-pathname cl:functions:math-add cl:functions:math-divide cl:functions:math-greater cl:functions:math-less cl:functions:math-multiply cl:functions:math-not-equal cl:functions:math-not-greater cl:functions:math-not-less cl:functions:math-subtract cl:functions:mod cl:functions:muffle-warning cl:functions:not cl:functions:rational cl:functions:pathname cl:functions:rational cl:functions:string cl:functions:values cl:macros:and cl:macros:lambda cl:types:character cl_symbols_lambda cl:types:and cl:restarts:continue cl:restarts:muffle-warning cl:restarts:store-value cl:restarts:use-value cl:types:character cl:types:complex cl:types:cons cl:types:eql cl:types:error cl:types:list cl:types:logical-pathname cl:types:mod cl:types:nil cl:types:not cl:types:null cl:types:pathname cl:types:rational cl:types:values cl:types:vector cl:variables:repl-minus cl:variables:repl-plus cl:variables:repl-slash cl:special_operators:function cl:special_operators:labels (maybe we don't need diff of that) cl:special_operators:macrolet (maybe we don't need diff of that) cl:special_operators:function cl:special_operators:labels cl:special_operators:macrolet
-
Maybe we don't need diff of topic e.g., cl:special_operators:labels. Reason is that topic like it probably is identical in content like other topic that diff we have.
-
Section "See Also" mostly changed in fallowing way (example from cl_functions_pathname-device) CLHS:
- pathname, logical-pathname, Section 20.1 (File System Concepts), Section 19.1.2 (Pathnames as Filenames)
CLUS:
+* System Class PATHNAME
+* System Class LOGICAL-PATHNAME**
+ {\secref\FileSystemConcepts}
+ {\secref\PathnamesAsFilenames}
Adding capitalization, better description and changing way of reference was only change in, I think, 300 diffs. Problem is that after a 300 diffs with good changes in that section, I just look at "See Also" without enough care, so I most likely missed some problems. Today looking for example to this point I recognized that I missed "*" in diff above.