A quick review of a some instructions reveals they are very poorly coded, performance-wise. Most all of them have not been modified since the repository was initially setup over 17 years ago and use vfetchb and vstoreb to process their operands one byte at a time (which is extremely inefficient!):

Note: Instructions which are checked have been recently modified and now no longer need to be improved. The ones that are unchecked are those whose performance still needs to be improved.

[x] B241 CKSM Checksum
[ ] A9 CLCLE Compare Logical Long Extended
[x] B25D CLST Compare Logical String (completed by Bob Polmanter)
[ ] B257 CUSE Compare Until Substring Equal
[ ] B2A7 CU12 Convert UTF-8 to Unicode
[ ] B9B0 CU14 Convert UTF-8 to UTF-32
[x] B255 MVST Move String (completed by Bob Polmanter)
[x] B25E SRST Search String (completed by Bob Polmanter)
[ ] B2A5 TRE Translate Extended
[ ] B9BF TRTE Translate and Test Extended
[ ] D0 TRTR Translate and Test Reverse
[ ] B9BD TRTRE Translate and Test Reverse Extended

I believe that by using the same or similar techniques as were used to improve the performance of the CLCL, MVCIN and TRT instructions (see issue #99: Poor performance of CLCL, MVCIN and TRT instructions), the performance of each of them could be significantly improved (very likely by at least one, and possibly two, orders of magnitude!).

I would like to invite any/all Hercules developers to please try their hand at doing so.

I could use some help here!

This is a low priority item, but really should eventually be looked into. While I have not performed any measurements in this area, I rather suspect some of these instructions(*) might well be used quite extensively.

Bottom line: there's lots of room for improvement here!

(*) The MVST (Move String) SRST (Search String) instructions appears to have been specifically written to handle the C/C++ strcpy and strchr library functions for example, and TRTE is a perfect candidate for the C/C++ memchr library function, etc.

Jun 19 '18 02:06 Fish-Git

Since @ivan-w has volunteered in Issue #108 to look into the TRTR, TRTE and TRTRE (and hopefully TRE too!) instructions, I am marking this issue as "In Progress...".

Apparently(*) however, I cannot assign this issue to him until he first posts a comment to this GitHub Issue. Once he does then I'll do that.

(*) The only choices GitHub appears to offer for assignment are individuals participating in a given issue's conversation (i.e. only those that choose to reply). Go figure.

Jul 09 '18 09:07 Fish-Git

Since @ivan-w has volunteered in Issue #108 to look into the TRTR, TRTE and TRTRE (and hopefully TRE too!) instructions, I am marking this issue as "In Progress...".

Apparently(*) however, I cannot assign this issue to him until he first posts a comment to this GitHub Issue. Once he does then I'll do that.

(*) The only choices GitHub appears to offer for assignment are individuals participating in a given issue's conversation (i.e. only those that choose to reply). Go figure.

The above is no longer true now that @ivan-w is now a member of our organization, so I've gone ahead and assigned this issue to Ivan.

Ivan? If this is unacceptable, then please feel free to remove yourself as the assignee!

Jul 28 '18 18:07 Fish-Git

@Fish-Git I think I will sign up to do the three string instructions on the list: CLST, MVST, and SRST. Unless you know that someone else is working on them?

All three of these could benefit from using techniques similar to those already used for some other instructions mentioned, making allowances for access exceptions in the right order and considerations for page boundaries.

Aug 19 '18 22:08 wably

I think I will sign up to do the three string instructions on the list: CLST, MVST, and SRST. Unless you know that someone else is working on them?

Go for it! PLEASE! :)

Ivan volunteered (or implied he volunteered) in another issue (#108), but I have no idea whether or not he is actually working on it. You should probably coordinate your effort with him off-list. He may just be busy with work or real life. (whatever the heck those things are!)

Thanks!

Aug 20 '18 04:08 Fish-Git

Since nothing seemed to be happening here I went ahead and completed improvements to the three string instructions, MVST, SRST, and CLST. All three had their execution times reduced by over 80%. Below summarizes the results:

inst	before	after	red%
MVST	9.62	1.21	87%
SRST	5.40	1.01	81%
CLST	10.05	1.67	83%

All three timings were moves, searches, or compares across three page frames, with 100,000 iterations each. All times are in seconds. The host computer was an Intel i7. SRST 'before' results are approximately one-half of the others because only one operand has to be incremented. The execution times were improved by implementing the page boundary dance for each operand thereby avoiding a vfetch( ) or vstore( ) for every byte.

I will commit this code in a day or two. I just want to double check everything and ensure that I have all of the test cases covering all condition codes and operand combinations (crossing pages, not crossing, etc).

Dec 15 '18 19:12 wably

Since nothing seemed to be happening here I went ahead and completed improvements to the three string instructions, MVST, SRST, and CLST. All three had their execution times reduced by over 80%.

Wow! Fantastic work, Bob! Thank you!

I will commit this code in a day or two. I just want to double check everything and ensure that I have all of the test cases covering all condition codes and operand combinations (crossing pages, not crossing, etc).

I look forward to seeing your commit! :)

Dec 16 '18 01:12 Fish-Git

String instructions committed by 8194b34

@Fish-Git if you are still seeing problems with tabs in the source, please inform.

Dec 16 '18 13:12 wably

String instructions committed by 8194b34

Thank you, Bob! Have you created a runtest test case for these instructions yet?

if you are still seeing problems with tabs in the source, please inform.

I checked and I didn't see any.

Dec 17 '18 01:12 Fish-Git

Fish,

My test cases are simple assembler programs that run under VM (they could also run under MVS with some JCL wrapped around them). There are three programs, one for each instruction improved. Each program does a number of tests, basically it sets up a situation and then executes the instruction. The results of the execution are displayed and it is compared to the known expectation (basically the results of these exact same tests before my modification). Then on to the next situation and test for the same instruction. Each possible condition code and combinations of page crossing operands (or not) were exercised and checked against known results.

So after all of that and to answer your question, no I do not have a ‘runtest’ test case; I don't even know what that is. I can make my programs available of course, if that is acceptable.

Dec 17 '18 02:12 wably

Hi Bob:

If you would post / PM the programs to me, I would be happy to create runtest cases.

Steve

Dec 17 '18 13:12 srorso

Steve,

Thanks for the offer, but you shouldn't have to do what I should have done in the first place.

After my response to Fish last night, I remembered that there was a 'tests' directory in the repo, and when I looked there I saw the kinds of tests that he was referring to. So, I can go ahead and create tests similar to those as appropriate for the string instructions.

It will definitely be more of a pain to put these together than the ALC programs I have now, but I recognize some reasons for having the tests be more or less standardized and not reliant on IPLing an OS prior to running a test. The learning curve to put these together effectively is the painful part; I can whip up stuff in ALC much faster and then move on.

Bob

Dec 17 '18 14:12 wably

Hi Bob,

I understand fully, but the offer stands. Teamwork and all that.

Harold Grovesteen's SATK package is a worthy tool, and I am happy to field questions, particularly about the BFP examples.

Steve

Dec 17 '18 14:12 srorso

This is definitely a case of "no good deed goes unpunished".

Dec 17 '18 15:12 wably

This is definitely a case of "no good deed goes unpunished".

(LOL!) I just asked a simple question! :)

Dec 17 '18 20:12 Fish-Git

@wably : I just committed some minor tweaks (*) to your code, mostly pertaining to rewording of comments to make them shorter and fit within 72-80 characters, but also some very minor coding tweaks too (but only a few!) in a few places too.

I hope you don't mind. :/

(*) I Just can't help myself sometimes! I'm convinced it's a form of OCD that just compels me to do sh^htuff like this. It's hard to control. :(

Dec 18 '18 22:12 Fish-Git

Fish, I reviewed your changes and I am fine with them. Now that I can see how you want things to look I can make a better effort to comply with it. Actually, if you published a 'coding standards' document for that sort of thing, then it would be a lot easier to follow the guideline - if the look of the code is important to you.

Dec 18 '18 22:12 wably

Actually, if you published a 'coding standards' document for that sort of thing, then it would be a lot easier to follow the guideline - if the look of the code is important to you.

The only real coding standard that exists doesn't really officially exist anywhere (i.e. it isn't documented anywhere). It's mostly just two things:

No tabs. Use spaces instead.
Braces should be on separate lines.

Everything else is just what I would call common sense programming:

Try to make your code simple and clear.
Use comments where needed. (what's obvious to you might not be to others!) (*)
Refrain from using explicit values. Use #define constants instead.
Don't repeat yourself. Create a callable function instead.
Don't abuse macros. Use inline functions instead. (**)
Don't be afraid to use blank lines to break up logical sequences of code. (***)
Etc...

There's probably some more I could probably add but that's good enough for now.

(*) This is especially important when there is anything that is even remotely "subtle". If something has to be done in a certain way because doing it differently would break something somewhere else, then that is definitely something that should be well commented!

(**) There's far too much of this in Hercules IMHO! Using macros that expands into a bunch of code not only makes it harder to maintain the code, but also harder to debug. Source-level debuggers can't step through macros nor allow you to set a breakpoint on a specific line of the macro, nor examine its variables, etc. Plus, macros that generate code aren't instruction cache friendly either. If you find yourself wanting to code a macro because you need to generate the same code in many different places, code an inline function instead! It's more host-processor instruction-cache friendly and is easier to debug, allowing you to set breakpoints and examine variables and step through the code, etc.

(***) Some might not appreciate that. Some programmers feel blank lines makes the code take up too many lines vertically on your screen, making it more difficult to see as much of the code as possible. Me, I'm the opposite: I feel it's easier for the human being to absorb/understand the code when it's in smaller easier to consume bite-size chunks than in larger chunks that your mind has to then break apart. Besides, if you use your monitor in portrait mode instead of landscape mode (which IMO all programmers should do as it allows them to see more code!), then a few extra blank lines here or there is no big deal.

Dec 18 '18 23:12 Fish-Git

Here are the rules I try to live by. I don't always adhere to many of them but I usually try to for most of them. (I think)

(Ref: https://www.goodreads.com/book/show/598624.Writing_Solid_Code)

Writing Solid Code

by Steve Maguire

Always ask, "Can this variable or expression over- or underflow?"
Always look for, and eliminate, flaws in your interfaces.
As you step through code, focus on data flow.
Create thorough subsystem checks, and use them often.
Define explicit function arguments. Eliminate ambiguity.
Design your tests carefully. Nothing should be arbitrary.
Document unclear assertions.
Don't allow unnecessary flexibility.
Don't clean up code unless the cleanup is critical to the product's success.
Don't fix bugs later; fix them now.
Don't hide bugs when you program defensively.
Don't implement nonstrategic features. There is no free lunch.
Don't keep 'trying' solutions until you find one that works. Take the time to find the correct solution.
Don't reference memory that you don't own.
Don't wait until you have a bug to step through your code.
Don't write multipurpose functions. Write separate functions to allow stronger argument validation.
Either remove implicit assumptions, or assert that they are valid.
Eliminate random behavior. Force bugs to be reproducible.
Enable all compiler warnings.
Fix the cause, not the symptom.
Handle special cases just once.
If something happens rarely, force it to happen often.
Implement your designs as accurately as possible.
Maintain both ship and debug versions of your code.
Make code intelligible at the point of call. Avoid Boolean arguments.
Make it hard to ignore error conditions. Don't bury error codes in return values.
Never allow the same bug to bite you twice.
Shred your garbage.
Step through every code path.
Strip undefined behavior from your code.
Strive to implement transparent integrity checks.
Throw away your bag of tricks. Be truly clever: Write boring code.
Tight C code does not guarantee efficient machine code.
Use a second algorithm to validate your results.
Use assertions to detect impossible conditions.
Use assertions to validate all function arguments.
Use lint to catch bugs your compiler may miss.
Write and test code in small chunks. Always test your code, even if that means your schedule will slip.
Write comments that emphasize potential hazards.
Write functions that, given valid inputs, cannot fail.

Dec 18 '18 23:12 Fish-Git

Runtest cases for the three improved string instructions added by commit 44e4d07

Many thanks to Steve Orso, who provided assistance, examples, tips and advice in getting started with runtest cases!

Dec 27 '18 18:12 wably

Runtest cases for the three improved string instructions added by commit 44e4d07

Many thanks to Steve Orso, who provided assistance, examples, tips and advice in getting started with runtest cases!

I appreciate the effort, guys! THANKS! :)

As new instructions are added (or existing ones enhanced) I feel it's important to add a runtest test case to our existing suite that verifies the instruction is [still] working properly. So I really appreciate the extra effort, guys. Thanks!

And just as an FYI, the proper way to "restore" an ostailor setting is to use ostailor default, not null.

That is to say, ~ostailor null is an ostailor setting unto itself that completely disables the displaying of any program interrupts (suppresses them all)~, but it is not the default setting when Hercules first comes up. That's what ostailor default is for: it's sets the default ostailor value.

It's no big deal. I've already fixed it for you. It's just an FYI.

Dec 29 '18 05:12 Fish-Git

ostailor null is an ostailor setting unto itself that completely disables the displaying of any program interrupts (suppresses them all)

(Ack!) WRONG!

"ostailor quiet" suppresses them all!

"ostailor null" displays them all!

HHC01603I help ostailor
HHC01603I
HHC01602I Command               Description
HHC01602I ----------------      -------------------------------------------------------
HHC01602I ostailor             *Tailor trace information for specific OS
HHC01603I
HHC01603I Format: "ostailor [quiet|os/390|z/os|vm|vse|z/vse|linux|opensolaris|null]".
HHC01603I Specifies the intended operating system. The effect is to reduce
HHC01603I control panel message traffic by selectively suppressing program
HHC01603I check trace messages which are considered normal in the specified
HHC01603I environment. The option 'quiet' suppresses all exception messages,
HHC01603I whereas 'null' suppresses none of them. The other options suppress
HHC01603I some messages and not others depending on the specified o/s. Prefix
HHC01603I values with '+' to combine them with existing values or '-' to exclude
HHC01603I them. SEE ALSO the 'pgmtrace' command which allows you to further fine
HHC01603I tune the tracing of program interrupt exceptions.
HHC01603I
HHC01603I ostailor quiet
HHC01603I pgmtrace
HHC02281I pgmtrace == none
HHC01603I ostailor null
HHC01603I pgmtrace
HHC02281I pgmtrace == all
HHC01603I ostailor default
HHC01603I pgmtrace
HHC02281I * = Tracing suppressed; otherwise tracing enabled
HHC02281I 0000000000000001111111111111111222222222222222233333333333333334
HHC02281I 123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0
HHC02281I                **    *     *                                   *

Sorry for the misinformation! :(

Dec 29 '18 05:12 Fish-Git

@Fish-Git or @ivan-w Do you know if the Translate instructions in the list at the beginning of this issue can be checked off as completed, by the changes made in issue #107 and #108?

Dec 29 '18 12:12 wably

The Translate and Test (and various variations) are complicated to test for compliance, but it's certainly feasible. The specific issue with Translate and Test instructions is that any access exception only occurs 'as needed' - so if the 2nd operand crosses a page boundary (and the 2nd page isn't accessible) but the data in the 1st operand doesn't mandate access the second page of the second operand, then everything should work without any exception. A complete unit test to test all conditions is tricky!

Dec 29 '18 13:12 ivan-w

@ivan-w Right, that I understand. But back to the question... do those changes you made also address the performance issue or is that still to be tackled?

Dec 29 '18 13:12 wably

The performance issues have been dealt with. TRT only required a few adjustment to deal with boundary issues just to be architecture compliant! The original version went through translation for every step... The optimized version did translation as a block transaction - but would issue an unnecessary exception in some borderline case. The current version has the optimized version and deals with the borderline case. The current issue is the test case which should test all the conditions for possible exception (or lack thereof).

Dec 29 '18 13:12 ivan-w

Improved performance for the CKSM instruction as well as runtest cases added by commit 9f27856

The execution time for CKSM was reduced by 82%, from 10.98 seconds to 1.89 seconds (500,000 iterations of CKSM on a 14,000 byte buffer of mixed data).

Dec 30 '18 13:12 wably

Thank you, Bob! Well done! :)

Dec 31 '18 00:12 Fish-Git

Thank you all for this observation. I've been running some benchmarks on the IBM MVS 3.8J with latest SDL-Hercules/Hyperion source code from GitHub as of 2022-05-24 compiled with gcc version 11.3.1-2 with default ./configure optimization options of gcc -O3 -march=native, etc on an Intel i7 4900 MQ.

The COBOL program shown below takes about 50 seconds to run on Hercules. This is compared with 6 seconds when run on a DEC PDP-10 36 bit Tops-10 using the klh10 emulator and DEC's COBOL compiler of the same exact COBOL benchmark program.

I know this "GIT Hercules instruction slowness issue" is listed with low priority, but would really be a huge improvement to Hercules if the source code of it's instructions could be optimized in a safe way with an abundance of unit tests to ensure existing functionality isn't broken.

The MVS 3.8J COBOL benchmark program I've been using is shown below. Since the variable Y is being displayed, it forces the compiler to do all the intermediate calculations. The use of the PMAP option shows the assembler object code generated.

benchcob.txt

May 24 '22 18:05 wgs777

The COBOL program shown below takes about 50 seconds to run on Hercules. This is compared with 6 seconds when run on a DEC PDP-10 36 bit Tops-10 using the klh10 emulator and DEC's COBOL compiler of the same exact COBOL benchmark program.

Immaterial. IBM mainframes use a completely different machine architecture than the DEC PDP-10. They are not the same (nor even remotely similar). The two machine architectures are completely different from one another, and as a result, so are the hardware emulators themselves too. You are comparing apples to oranges.

The MVS 3.8J COBOL benchmark program I've been using is shown below. Since the variable Y is being displayed, it forces the compiler to do all the intermediate calculations. The use of the PMAP option shows the assembler object code generated.

benchcob.txt

(emphasis added)

You are presuming that we too are also familiar with, and run, the IBM MVS 3.8J operating system on Hercules. Such a presumption is erroneous. I know nothing about IBM MVS 3.8J, so your providing the COBOL source code is personally unhelpful to me.

What would be more helpful would be seeing the actual object code (i.e. the actual hardware instructions) that the IBM MVS 3.8J COBOL compiler generates (i.e. this presumed "PMAP" output you mention) so that we can review each of the instructions involved to see whether there is maybe anything we can do to improve their respective performance on Hercules.

Thanks.

May 25 '22 00:05 Fish-Git

Thank you for your time on this. I've uploaded the file below which contains the Cobol + Assembler code + additional info.

assembler_and_cobol_code.txt

May 25 '22 13:05 wgs777

Performance of some instructions could be significantly improved

Writing Solid Code

by Steve Maguire