voc icon indicating copy to clipboard operation
voc copied to clipboard

Optimization of module errors.Mod

Open Oleg-N-Cher opened this issue 9 years ago • 12 comments

Norayr commented out code to get a list of errors from file OfrontErrors.Text in OPM.LogErrMsg, but he forgot to comment out all declared variables, that only need to run this code.

I propose to make output of an error message without using of array "errors.errors: ARRAY 350 OF ARRAY 128 OF CHAR". These messages are only required to output into the console, they are not used in any way differently. accordingly, we can achieve economy of memory.

• https://github.com/Oleg-N-Cher/OfrontPlus/commit/8f7ee7d1738156ee831dd483d65db9738d1b82cd

Oleg-N-Cher avatar Jul 11 '16 00:07 Oleg-N-Cher

P.S. To translate my module OfrontErrors.Mod, with using a large number of case labels, I increased the constant OPM.MaxCases from 128 to 256 labels.

Oleg-N-Cher avatar Jul 11 '16 00:07 Oleg-N-Cher

How about an array of pointers to array of char? I believe it would generate (slightly) simpler and smaller code than a CASE statement. Assuming the pointers

dcwbrown avatar Jul 11 '16 14:07 dcwbrown

Oops, I meant to cancel my comment, looked for a cancel button, and misread 'close and comment' as what I wanted. Please ignore the above.

dcwbrown avatar Jul 11 '16 14:07 dcwbrown

I believe that multiple labels of CASE here produces a much more compact and efficient code than the allocating an array, especially array of pointers, which make the additional load on the garbage collector.

As you see, David, we have all the strings for the messages have different lengths, so reservations space in the array for them will require space for each line as for the longest line.

However, maybe I just do not understand the idea, tell me more, please.

Oleg-N-Cher avatar Jul 11 '16 18:07 Oleg-N-Cher

Well, this was interesting. I built the CASE version and had a look at the code. For a 32 bit build, each case costs the length of the string plus 29 bytes. This is the code for the first 2 cases (on top of this there is a ptr array with one entry per case.)

 961                    L7:
  14:errors.c      ****                 case 0:
  15:errors.c      ****                         Console_String((CHAR*)"undeclared identifier", (LONGINT)22);
 963                    LM10:
 964 0077 C7442404              movl    $22, 4(%esp)
 964      16000000
 965 007f C7042400              movl    $LC0, (%esp)
 965      000000
 966 0086 E8000000              call    _Console_String
 966      00
  16:errors.c      ****                         break;
 968                    LM11:
 969 008b E9120F00              jmp     L164
 969      00
 970                    L9:
  17:errors.c      ****                 case 1:
  18:errors.c      ****                         Console_String((CHAR*)"multiply defined identifier", (LONGINT)28);
 972                    LM12:
 973 0090 C7442404              movl    $28, 4(%esp)
 973      1C000000
 974 0098 C7042416              movl    $LC1, (%esp)
 974      000000
 975 009f E8000000              call    _Console_String
 975      00
  19:errors.c      ****                         break;
 977                    LM13:
 978 00a4 E9F90E00              jmp     L164
 978      00
 979                    L10:

This is considerably better than the pre-allocated 350x128 array. But it is possible to reduce that 29 byte overhead per message to 16 bytes if you are OK using SYSTEM.VAL:

MODULE errors;
IMPORT Console, SYSTEM;

VAR msg: ARRAY 350 OF LONGINT;  i: INTEGER;

PROCEDURE Write*(n: INTEGER);
TYPE CharPtr = POINTER TO ARRAY 1024 OF CHAR;
VAR p: CharPtr; i: INTEGER;
BEGIN
  IF (n >= 0) & (n < LEN(msg)) & (msg[n] # 0) THEN
    p := SYSTEM.VAL(CharPtr, msg[n]);
    i := 0;
    WHILE p[i] # 0X DO  Console.Char(p[i]);  INC(i);  END
  END
END Write;

BEGIN
  FOR i := 0 TO LEN(msg)-1 DO msg[i] := 0 END;

  (* Incorrect use of the language Oberon *)
  msg[0] := SYSTEM.ADR("undeclared identifier");
  msg[1] := SYSTEM.ADR("multiply defined identifier");
  msg[2] := SYSTEM.ADR("illegal character in number");
  msg[3] := SYSTEM.ADR("illegal character in string");
  msg[4] := SYSTEM.ADR("identifier does not match procedure name");
  msg[5] := SYSTEM.ADR("comment not closed");
...
etc.

Per error message that's 10 bytes of overhead for the initialisation assignments, and 4 bytes for the table of pointers itself. Saving nearly half the overhead of the CASE approach.

That's what I was beginning to think about when I sent my premature message above.

However I'm not convinced the ptr approach is worth the added complexity - what do you think?

dcwbrown avatar Jul 11 '16 20:07 dcwbrown

You're absolutely right, David. A little complicated for such a simple task. In addition, we have to use the SYSTEM module. It may make sense, where the solution is critical for memory usage. In any case, your code looks very inventive. :)

I would have corrected:

TYPE CharPtr = POINTER TO ARRAY 1024 OF CHAR;

to:

TYPE CharPtr = POINTER [1] TO ARRAY 1024 OF CHAR;

To quote you a letter from Josef Templ:

SYSTEM.PTR is a garbage collected pointer, i.e. it points to any Oberon object. It only relaxes the type checking rules very much like an abstract base type OBJECT. An untagged pointer [1] points anywhere into the external world, i.e. it is not garbage collected. Such a pointer can be considered as an unsigned integer value without any special meaning to the runtime system.

EnumPtrs is supposed to enumerate anything that needs to be seen by the garbage collector. Enumerating an untagged pointer will crash or at least corrupt the system, unless that pointer happens to point to an existing Oberon object. argv must not be enumerated.

  • Josef

Oleg-N-Cher avatar Jul 12 '16 02:07 Oleg-N-Cher

Many thanks for the details of POINTER [1] - I've noticed that in passing, I should have looked into it before, very useful to know. Weird syntax.

What were the reasons for moving the messages out of the text file into code? Arguably the text file has

  • the least overhead per string
  • the least usage of code space memory
  • ease of localisation

dcwbrown avatar Jul 12 '16 15:07 dcwbrown

Syntax of POINTER [1] is traditional for presence low-level features in Oberon (supported in the most mature and complete implementations - ETH Oberon, Active Oberon, BlackBox Component Builder). BlackBox supports POINTER [1] as a synonym of POINTER [untagged].

Moving the messages out of the text file into code can be viewed from many perspectives.

  1. Originally ETH Oberon has a list of error messages in a text file. Ofront also supports this tradition. In addition, Ofront is of the two variations - as environment, such ETH Oberon, and as a command-line tool. Both uses the same file for error messages.
  2. Almost all command-line compilers known to me do not have a separate file for a list of errors, everything is included inside. The presence of the file may have a problem with its integrity, matching error file and compiler versions, etc. Also, for Linux, you need to set permissions on this file. In the current implementation file search works only in the directories that are specified in a variable OBERON, by default excluding even the current directory and the directory where the executable file "ofront" is stored.
  3. The idea of localization and translation into the various languages of the world looks good. I would like to ask people, who read this message, to express their views. Which language would you like to see in the list of supported languages? For example, I can do the translation into Russian and Ukrainian. But I want to note that the audience of a compiler users are programmers. And any programmer understands English. Again, I do not know any command-line compiler, which have localized the error output in different languages.
  4. We can consider the idea of an additional language file localized at any desired language. And basic language, English, will be present inside the compiler, and will be used in the absence of the external file.
  5. Technical problems. For localization and support national languages, we need to make support for UTF8 in Console.Mod for Windows, and maybe even the type long CHAR.

P.S. As for my actions right now, I am satisfied with the module OfrontErrors.Mod, and I'm not worried about memory loss, it's very small.

Oleg-N-Cher avatar Jul 12 '16 20:07 Oleg-N-Cher

The other option would be to keep error messages in a file (like fpc does), find, and open the corresponding line number from that file, read it and output.

I was thinking to implement that instead, however, when considering speed vs memory, I have chosen memory. People have one life, time is priceless. (: I know it's too small amount of time though.

I doubt that changing case limit is a good idea. It's there for a reason. There are other limits, like variable length limit, it's also there for a reason, though is not defined by the report.

norayr avatar Jul 13 '16 12:07 norayr

All this mysterious reason called "2. Limitations of implementation", see OfrontErrors.Text

I'm sure that changing case limit is a good idea. There is constant OPM.MaxStruct = 255, and it was enough for the previous Ofront users. But when I adapted module WinApi.Mod from BlackBox, I was forced to increase this value, because there are a lot of RECORDs. And it's very good, because now we can build the modules with a large number of RECORDs, using only slightly more memory than before.

Let's count a little. What will cost MaxCases increase from 128 to 256.

TYPE
    CaseTable = ARRAY OPM.MaxCases OF
        RECORD
            low, high: INTEGER
        END ;

If INTEGER is 4 bytes, so we have 128 extra array elements of 4+4 bytes for each one. 128*8 = 1024 = 1 Kb. Very reasonable price. although in the good in such cases it should be rewritten getting rid of static arrays in favor of dynamic memory allocation.

P.S. I do not like excessive conservatism of Norayr in such matters, which is why I am not in the voc team.

Oleg-N-Cher avatar Jul 13 '16 13:07 Oleg-N-Cher

Er, looking at OfrontErrors.Text I don't see an obvious big issue?. Mind you I think my implementation is slightly simpler - take a look at https://github.com/vishaps/voc/blob/ErrorsExperiment/src/compiler/errors.Mod and https://github.com/vishaps/voc/blob/ErrorsExperiment/VishapOberonErrors.txt. I rather like this.

I'm 59 and retired. I learned and admired Pascal at University in 1976 on the original clean Wirth and Knudsen compiler on the rather wonderful CDC 6000. On graduating (barely) I joined the computer centre at Imperial college and one of my responsibilities was the Pascal compiler, to which I added lower case support (a double byte character set on a 6 bit character machine!) and automatic stack extension, among other things. Then I worked for 15 years as so-called development manager of a very small team at a small buisness where pretty much all my high level language work was in TopSpeed Modula (I wrote quite a lot in assembler too, particularly Arabisation layer for DOS.) Then 16 years at Microsoft in text rendering including complex scripts, in C and C++, always wishing for the simplicity of Modula.

As you get older you can't avoid getting more and yet more experienced about how programs work, and the less helpful are conventions like particular implementations of OO. At the start C's classes and Oberons type-bound procedures provide a nomenclature that genuinely helps one think at a higher level. As time go by it is surprising how often you find you didn't need your chosen language's syntactic sugar for objects at all and the program is simpler to wirte, more straightforward to understand, and easier to maintain written withiout them.

So I've finally reached the point where I suspect Wirths apparently pathological need for simplicity is quite correct. I love that he does so much with extensible types and a single handler procedure. And I see it as both simpler and more flexible than type-bound procedures. When I eventually stop fiddling with implementation and build details I want to experiment more with that.

So I agree that the extra memory usage arising from increasing the number of structures or cases that can be handled is not much to worry about. But I don't think that's the whole point, or even the most important point. Windows needs that number of record types not because the problem being solved inherently needs that many record types, but because records have been added with each each addition of extended but overlapping functionality over many many years. Compatibility means nothing can be removed, and new developments must be built on top of old ways of doing things. And the need to remain compatible also tends to block the division of functionality into separate files. Arguably if the tools didn't allow large numbers of records or cases, development would have taken longer. But ways would have been found to keep things simple. Which would have made later developments easier. The age old 'more effort up front, much less effort in the long term'. Which, sadly, is in conflict with 'first to ship, wins'.

I cherish Oberon's principle of keeping it simple.

So, perhaps support for an increased number of records or cases should be associated with some explicit admission of undesirable complexity parallel to the meaning of 'IMPORT SYSTEM'. A notation that means something like 'I'm terribly sorry, but external constraints prevent this module form being well designed'. Perhaps, with reference to the well known criticism of C, 'IMPORT GunPointedAtFoot' :smiley:.

dcwbrown avatar Jul 13 '16 16:07 dcwbrown

I want to note that it is very good if such a decision (with the external file for error messages) will allow to search for this file in the same directory that contains the executable file "ofront", even if the path is not specified in a variable PATH. I had some trouble finding the file in a way that I had to put it in each of my working directory.

I want to say that I appreciate the work on voc done by David. Apparently, my experience is not so great (I'm only 37), and I feel a lack of confidence in many aspects of Ofront development. But I really like Ofront and I want to develop it, remaining in spiritual harmony with myself, knowing that everything is done as good as possible. My views on the further development - make Ofront convenient for industrial usage.

About Oberon. Of course, we have the perfect simple well-defined language, but for the practical use, each implementation generally has own specific features due to the subject area. However, Wirth uses a specific variant of Oberon for each task. All for the sake of efficiency. Incidentally, the same Oberon-07 no one uses in the form as defined by Wirth, even for Minos they made their changes. For example, I read about a dialect of Oberon-07, which has no nested procedures and export of variables.

Igor Schagaev, Thomas Kaegi-Trachsel - Software Design for Resilient Computer Systems (2016):

First, nested procedures as it is difficult to ind the accessible state space of a nested procedure at compile time. Nested procedures do not add additional functionality to the language but simply provide a nonessential structuring mechanism. The second feature we remove is the export of read-only module variables. Again, this helps the compiler to limit the state space a procedure in a module can access. Read-only variables can be simulated by the wrapping them into exported procedures. Under these constraints, the only way of interaction between modules is by accessing exported procedures. Generating a recovery point at cross-module procedure calls seems to be a natural implementation approach. If this approach is not fain grained enough, one could consider generating recovery points at every procedure entrance.

On this way to simplify you can go very far. Excess simplicity you can bring to the absurd as well as excessive complexity. So my strategy - the clarification of Oberon in the context of industrial development. And I'd like to see in Ofront untagged pointers and extra features that I need.

We, the Ofront+ team, have implemented constant arrays for Ofront. And the error output may be written, for example, like this:

TYPE Messages = ARRAY 350 OF ARRAY 128 OF CHAR;
CONST Errors = Messages(
    "undeclared identifier",
    "multiply defined identifier",
    "illegal character in number",
    ...
);

This solution also wasting memory. But it's simple.

It may seem unnecessary complication of the Oberon, but we need the balance between complexity and simplicity. We must make a difficult choice in favor of what exactly should be in the language, and what is not. The basis of Oberon is already strong. but as we proceed with this - it is our choice. We must provide feeling this solid basis to mainstream programmers. In the context of solving our tasks with help of these instruments.

Oleg-N-Cher avatar Jul 13 '16 20:07 Oleg-N-Cher