stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

Style guide

Open milancurcic opened this issue 5 years ago • 71 comments

Use this issue to discuss the code style for the stdlib.

The most widely supported elements of style will eventually be merged into the Style Guide for contributors.

milancurcic avatar Dec 14 '19 23:12 milancurcic

@ivan-pi wrote in https://github.com/fortran-lang/stdlib/issues/11#issuecomment-566296346:

For example for the function that checks whether a character is a letter some of the possible names are:

  • is_alpha
  • isalpha (C/C++)
  • isAlpha (D language, but in Fortran it's the same as the previous suggestion)

I have discussed this particular issue with several people many years ago and we eventually converged towards the following compromise:

https://www.fortran90.org/src/best-practices.html#naming-convention

So in the above case, isAlpha would not be used, but both is_alpha and isalpha would be allowed, and in this case isalpha would be preferred (only two syllables). If the name was is_char_alpha vs ischaralpha, then the underscore version would be preferred because it is three syllables. However, if the API had lots of methods like is_alpha, is_char_alpha, is_not_alpha, then for consistency it should be named is_* with underscore.

Let me know what you think.

P.S. Some examples from current Fortran. No underscore: matmul, maxloc, minloc, iachar, adjustl, adjustr, maxexponent, minexponent, rrspacing, ishftc. Underscore: set_exponent, dot_product, bit_size, cpu_time. The standard is not consistent (e.g., maxexponent vs set_exponent) but it seems to be very close to the rule of 2 syllables -> no underscore, otherwise use underscore (e.g. matmul vs dot_product).

certik avatar Dec 16 '19 23:12 certik

isalpha contains three syllables.

In general I agree with your suggested naming conventions and also the ones in Fortran Best Practices. To stay close to the standard, I think it's best if we avoid CamelCase.

Specifically, for the ascii functions there are 11 functions starting with is, some are short (is_alpha), but some are long (is_hex_digit, is_octal_digit). I suppose it's easiest to just follow the C library and get rid of the underscores and shorten the names to 7-8 characters (isalpha, iszdigit - z is used in hexadecimal boz constants, isodigit - o is used in octal boz constants).

ivan-pi avatar Dec 17 '19 00:12 ivan-pi

You are right, isalpha is 3 syllables.

I think the rules can be broken where it makes sense and we have a good reason. Otherwise if we have no reason then the rules provide a good consistent default.

On Mon, Dec 16, 2019, at 5:18 PM, Ivan wrote:

isalpha contains three syllables.

In general I agree with your suggested naming conventions and also the ones in Fortran Best Practices https://github.com/Fortran-FOSS-Programmers/Best_Practices. To stay close to the standard, I think it's best if we avoid CamelCase.

Specifically, for the ascii functions there are 11 functions starting with is, some are short (is_alpha), but some are long (is_hex_digit, is_octal_digit). I suppose it's easiest to just follow the C library https://en.cppreference.com/w/c/string/byte and get rid of the underscores and shorten the names to 7-8 characters (isalpha, iszdigit - z is used in hexadecimal boz constants, isodigit - o is used in octal boz constants).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AAAFAWE3XEAVNGIXF5BNKJLQZALGHA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHAT2DY#issuecomment-566312207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFAWCAIOCRTWMOKJEVFX3QZALGHANCNFSM4J25FZ6Q.

certik avatar Dec 17 '19 00:12 certik

I'm on the same page here. I like all lowercase with underscores universally. For short variable and procedure names, dropping the underscore is OK I think. It's nice to have a universal rule to follow (always use underscores to separate words), but for some short names I think it's OK to make an exception.

I don't think the Standard being inconsistent (maxexponent and set_exponent) should be an excuse for us to be "consistently inconsistent". There's also a difference between names that are made of whole words (e.g. dot_product is nicer than dotproduct), vs. portmanteaus like matmul (instead of mat_mul which seems awkward).

In this specific case, even though iszdigit and isodigit are short and nice, there's a risk of them being less clear what they mean. To my eye, is_hex_digit and is_octal_digit were obvious. For iszdigit and isodigit, I had to read Ivan's explanation to understand. So I slightly prefer the underscore variants here, but I'm not hell-bent on it.

milancurcic avatar Dec 17 '19 03:12 milancurcic

Yes, I generally agree but for assertion-y stuff I think separating the verb with an underscore makes sense is_alpha. The legibility is much improved and client code can always consume it as a different name through use association.

use fortran_stdlib, only: is_alpha => isalpha

zbeekman avatar Dec 17 '19 15:12 zbeekman

Automation of pedantry

I think that we should also adopt solutions like Editorconfig, possibly findent cmake-format (if we use CMake) and clang-format (for any C code). Tools like these that integrate with IDEs and editors will help get everyone on the same page allowing them to focus on semantics and code rather than style and formatting. In addition, I've really been appreciating and enjoying pre-commit to codify conventions on a per-project basis, and help catch silly issues before developers even commit/push code.

zbeekman avatar Dec 17 '19 16:12 zbeekman

Yes, I agree with @zbeekman we should try to automate as much as possible (formatting, release notes, etc.).

And rather use our time and effort to discuss the actual API, not how it is formatted.

I agree that is_alpha looks better in this case. I would caution against recommending use fortran_stdlib, only: is_alpha => isalpha, because that is a pain to have to do that --- rather, let's do our best to choose the name well, and recommend people to use it. That way all the codes will use the same name and it will be easy for people reading the code to understand.

certik avatar Dec 17 '19 17:12 certik

In this specific case, even though iszdigit and isodigit are short and nice, there's a risk of them being less clear what they mean. To my eye, is_hex_digit and is_octal_digit were obvious. For iszdigit and isodigit, I had to read Ivan's explanation to understand. So I slightly prefer the underscore variants here, but I'm not hell-bent on it.

Should I get a vote, I also prefer more verbose but hopefully clearer names, and prefer lowercase and underscores universally: is_hex_digit > iszdigit is_octal_digit > isodigit

Regarding names of string functions/subroutines, I'd definitely prefer more verbose and explicit over the short and cryptic. Thus: as_integer > atoi as_double > atof

smwesten-usgs avatar Dec 17 '19 18:12 smwesten-usgs

@smwesten-usgs thanks for the feedback. Yes, I agree on the examples you gave that underscores look better.

Here are some examples where no underscores I think look better (from NumPy and Matlab):

  • linspace vs lin_space
  • meshgrid vs mesh_grid
  • argsort vs arg_sort

certik avatar Dec 17 '19 18:12 certik

Just speculation, but I suspect the difference between the underscore usage between maxexponent and set_exponent is that max is already the name of a different intrinsic function.

longb avatar Dec 17 '19 21:12 longb

I agree, and for the record, I wasn't suggesting we pick whatever and let people rename through use association; Good names from the get-go are important.

zbeekman avatar Dec 17 '19 23:12 zbeekman

Can we make some choices about indentation and tabs vs spaces? My preference is indent all indent-able things by two spaces; never use tabs. We can codify this in an editor-config file.

zbeekman avatar Dec 22 '19 17:12 zbeekman

My preference is 4 spaces, no tabs.

certik avatar Dec 22 '19 17:12 certik

A quick note: I'm happy with 2 or 4 but with the 132 character line limit, you can start to run out of real estate pretty quickly with 4 spaces.

zbeekman avatar Dec 22 '19 17:12 zbeekman

I also propose to stick to 80 characters per line.

More importantly though we should figure out automatic formatting checks at the CI and provide instructions how developers can format the code before submitting a PR.

On Sun, Dec 22, 2019, at 10:45 AM, zbeekman wrote:

A quick note: I'm happy with 2 or 4 but with the 132 character line limit, you can start to run out of real estate pretty quickly with 4 spaces.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AAAFAWCALED2B7ENIAUQTPLQZ6RUFA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHPV5RY#issuecomment-568286919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFAWDCAOJSSZM3QLGLKE3QZ6RUFANCNFSM4J25FZ6Q.

certik avatar Dec 22 '19 18:12 certik

80 characters per line and 4 spaces?! That seems like there will be an awful lot of line continuations and leading blanks... findent defaults to 2 spaces. The standard caps line length at 132 chars. I think we should shorten the line length allowed XOR pick 4 spaces of indentation, but I'd be weary to use both at once.

zbeekman avatar Dec 22 '19 20:12 zbeekman

Yes, I've been using 80 characters and 4 spaces in all my projects.

But whatever we decide is fine with me. Here are my priorities for stdlib:

  • The Fortran sources themselves: simple, well organized modules, super well designed API, building with "any" compiler on "any" platform, with no warnings, and excellent performance in Release mode.

  • A build system that works on all platforms (including Windows) and is "simple"

  • Good tests

  • Good CI infrastructure that supports the above

How we format the files are low on my priorities list, so whatever makes us work together works for me.

certik avatar Dec 22 '19 20:12 certik

I generally use 4 spaces (same as Python and most C++ codes I've seen), but I also don't mind if we agree to use 2 and I agree with the points of @certik. Generally, I try to stick to an 80 character limit, but I am happy to break the rule if it makes the indentation nicer.

ivan-pi avatar Dec 22 '19 21:12 ivan-pi

OK, let's go with 4 then, and we could admonish users to be within 80 characters (or 100, or whatever) but enforce a hard limit of 132 during CI. I'm working on getting a style document updated with comments from here and an .editorconfig file as well as CI testing and enforcement.

zbeekman avatar Dec 22 '19 21:12 zbeekman

I generally use whatever the number of spaces created by emacs when I hit "tab". Note that the line length limit will probably get really long (10,000 characters) in the next standard, so debates over 80 versus 132 are a bit dated. Personally I prefer to be able to read a line of code in a terminal window without horizontal scrolling. But I think that most "style guides" include a lot of unnecessary "nanny" rules that don't matter that much. I think a rule like "a sequence of lines that form a block in Fortran should be indented compared to the code lines that delineate the block" is sufficient. Much more relevant (in my view) would be code style rules such as "do not use include lines", or "never use preprocessing directives", or "never use tools like configure, cmake, or spack" would be more useful and relevant to Fortran.

longb avatar Dec 22 '19 21:12 longb

@longb haha "never use tools like ... cmake"

Sometimes there are necessary evils.

I agree about reading things in the terminal. Thats the spirit of creating a limit.

Concerning emacs, it's easy to setup to respect .editorconfig.

Here is what I have in my .emacs (I use use-package)

(use-package editorconfig
  :ensure t
  :after ws-butler
  :init
  (setq editorconfig-trim-whitespaces-mode
         'ws-butler-mode)
  :config
  (editorconfig-mode 1)
  ;; Always use tabs for Makefiles
  (add-hook 'editorconfig-hack-properties-functions
	    '(lambda (props)
	       (when (derived-mode-p 'makefile-mode)
		 (puthash 'indent_style "tab" props)))))

zbeekman avatar Dec 22 '19 22:12 zbeekman

I've started trying to capture this discussion in PR #42

zbeekman avatar Dec 22 '19 22:12 zbeekman

I prefer 2 spaces to indent, and try (but sometimes fail) to stay within 80-character line length. This is not for compatibility reasons, but for the same reason as Python recommendation -- I work in terminals of similar width and don't like the lines to wrap around.

I think consistency is more important than any specific style "rule". I also don't think we need style enforcement (just yet). A brief style guide with recommendations and common sense go a long way.

milancurcic avatar Dec 22 '19 23:12 milancurcic

Some style suggestions that are probably obvious, but seem to be violated in some user code:

  1. Do not use language features that have been deleted (PAUSE, Arithmetic IF, ...).
  2. Avoid using language features that are obsolescent (such as COMMON).
  3. Avoid vendor syntax extensions (such as REAL*8), especially when there are standard alternatives.
  4. Avoid vendor-specific, nonstandard intrinsic procedures (such as ETIME).

Violating these leads to portability problems.

longb avatar Dec 23 '19 23:12 longb

  1. Do not use language features that have been deleted (PAUSE, Arithmetic IF, ...).
  2. Avoid using language features that are obsolescent (such as COMMON).
  3. Avoid vendor syntax extensions (such as REAL*8), especially when there are standard alternatives.
  4. Avoid vendor-specific, nonstandard intrinsic procedures (such as ETIME).

I agree. We can express these in the style guide as simply as:

  • Use only standard features and intrinsic procedures that have not been deleted or maked obsolescent.

milancurcic avatar Dec 24 '19 17:12 milancurcic

@certik I'm looking at some merged code and I now understand why you're okay with 4 spaces to indent and 80 character line limit: You don't indent module, program, and procedure bodies. I think we should.

I like 2 space-indents but can work with 4 spaces as well. But I think we should indent all program unit bodies consistently. The recommendation can then be applied to indenting bodies of all Fortran constructs:

  • Use 4 spaces to indent the body of any Fortran construct

milancurcic avatar Dec 24 '19 18:12 milancurcic

Yes, indeed, I don't indent subroutines. The reason for that is that if you do, then every single subroutine is indented by 4 spaces. That's unfortunate, as that is lost space. Just like in C++, if you use namespaces, you do not indent the functions inside namespaces for the same reason. The indentation does not buy you anything. You know that every subroutine is inside a module. And module is like a C++ namespace with this respect.

Inside the functions bodies, I also don't indent (I don't think it buys you any readability and you lose horizontal space), but if you feel strongly about it, then we can.

Doing 2 spaces for subroutines and 4 spaces for everything else is pretty complicated --- I would need to figure out how to setup my Vim editor to support that. Doing 4 spaces everywhere then risks of running out of the 80 character limit more often, precisely as @zbeekman was worried.

But since we relaxed the 80 char limit rule, I think it's fine to use 4 spaces everywhere.

Overall, this is relatively low priority for me, so I am happy to do whatever you feel strongly about. Let's setup tools to do the formatting automatically though --- that I feel strongly about. :)

certik avatar Dec 24 '19 19:12 certik

I like indenting everything, but it'd be a stretch to say I feel strongly about it. It does help readability for me, but only because my eyes are attuned to it.

Let's hear from others and hopefully the decision will be easy.

@marshallward @jacobwilliams @zbeekman @ivan-pi @jvdp1 @longb @smwesten-usgs Where are you on indenting bodies of program units?

milancurcic avatar Dec 24 '19 19:12 milancurcic

What we can do is to choose something reasonable (anything we discussed above is reasonable for me) and setup automatic formatting. Then as we have more code, we can compare different formatting options side by side and we can change the default formatting by changing some configuration option in the formatting tool if we like one option better as a community.

certik avatar Dec 24 '19 19:12 certik

MOM6 does two-space indents.  My personal preference is 4 spaces since 2-space indent "errors" are harder to detect visually and can slip in more easily.  But I haven't noticed much of an issue using 2-space in MOM6.

We do not indent a module's contents, as long as we stick to one module per file.  In this interpretation, the model statement/ending, the contains, etc are more like declarations than indications of scope.  But these rules might not make sense if there were multiple modules per file.

Otherwise, I'd say everything that moves into a more narrow scope is indented: subroutine contents, type definitions, control blocks, etc.

I wouldn't want to deal with mixed rules, I'd say either 2 or 4 spaces for everything.

I am personally militant about 80-character limits, since I usually resize my fonts to fit about ~190 characters, or ~85 on two parallel terminals. MOM6 "encourages" 100, with a hard limit on 120. This works well overall (for everyone except me and my font sizes, that is!)

For all the various edge cases, like how to split binary operators, I tend to follow PEP8. But that mostly reflects my undying love for Python.

As an acquaintance of mine says: "If your code only looks good under a particular style, then I probably won't like reading your code anyway".  As long as things can be broken up into modular pieces, then it ought to look good under 2-, 4- or even 8-space indents.

PS if anyone wants to see the (incomplete) MOM6 style guide.

(And for the record I do not necessary agree with all of these rules!)

marshallward avatar Dec 24 '19 22:12 marshallward

Where are you on indenting bodies of program units?

I use 1-space for indenting subroutines/functions, but also interfaces, derived types,... I am fine with 2-space indents, and 80 (soft)-/132(hard)-character limits. 4-space indents seems a bit too much to me, but it would not be a problem neither.

jvdp1 avatar Dec 24 '19 22:12 jvdp1

Where are you on indenting bodies of program units?

I tend to indent everything (besides for contain statements which stay parallel to the enclosing module/type).

Since we plan to have a single file per module I can agree with @certik and @marshallward that indentation at module level would save some horizontal space. However I would prefer to indent subroutine/function bodies (like in Python). I think it helps to visually separate the functions/subroutines bodies from each other without the need for comment lines such as

!-------------------------------------------

which I find are just a nuisance to create and maintain.

ivan-pi avatar Dec 25 '19 15:12 ivan-pi

If I can have an annoying remark, I don't feel about indentations as strongly as I do about using spaces. For example,

if(s%value==3)error stop

Is nightmare to read compared to

if (s % value == 3) error stop

And since someone will likely disagree with this too, I would like suggest that people will be always unhappy with how things are indented and spaced out. Even here everyone has expressed a different opinion. I personally would only require sane indentation and not using tabs.

śr., 25 gru 2019, 16:58 użytkownik Ivan [email protected] napisał:

Where are you on indenting bodies of program units?

I tend to indent everything (besides for contain statements which stay parallel to the enclosing module/type).

Since we plan to have a single file per module I can agree with@certik and @marshallward https://github.com/marshallward that indentation at module level would save some horizontal space. However I would prefer to indent subroutine/function bodies (like in Python). I think it helps to visually separate the functions/subroutines bodies from each other without the need for comment lines such as

!-------------------------------------------

which I find are just a nuisance to create and maintain.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AC4NA3I6PPIGWTRVTVICHGTQ2N7LLA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHUOLAI#issuecomment-568911233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3KC3D2X74KZ26SJBALQ2N7LLANCNFSM4J25FZ6Q .

gronki avatar Dec 27 '19 16:12 gronki

Distilling the above comments down, and adding my own opinion it appears that:

  1. Most people want procedure bodies indented (I do too)
  2. Most people are OK with a "shoot for 80 chars, but hard limit of 132" line length
  3. With only one program or module per file, indenting these scopes does eat up blank space. With an indentation level of 4 this seems to be a bigger issue. While I agree that it isn't really necessary to indent these scopes if we adopt the convention of 1-per-file, from a tooling/enforcement perspective most tools, editors, etc. that I've seen want to indent these scopes by default.

Takin points 1-3 into consideration, as well as the comments expressed thus far, I would be inclined to indent two (2) spaces for all procedures and module and program scopes too. I'll go ahead and implement this in the EditorConfig file and style guide I've started in #42.

I would also recommend that we take this discussion over to the PR (#42) and that people make suggestions of how and where they would like to see the ideas discussed in this issue expressed. I've started working on a style guide document, and if you have expressed an opinion in this issue, you should look at #42 to ensure that the document is acceptable and that your opinion or suggestion has been sufficiently addressed.

zbeekman avatar Dec 29 '19 22:12 zbeekman

For all documentation I try to always include a complete working example program. I always call it "program demo_NNNNN" where NNNN is the name of the routine being tested, so it is easy to automate extracting and running the sample code, and to extract the code into a directory of small programs for use by users of the procedure. This lets the developer easily test code supplied in the documentation, and prevents simple syntax errors and problems like I found in the documentation for the experimental open(3f) routine:

program test
    use stdlib_experimental_stats, only: mean
    implicit none
    integer :: io, u
    u = open('example.dat', 'rt', iostat = io)
end program

So as to facilitate something like that in the long term, this should start with "program demo_open" and end with "end program demo_open". If all the steps were implemented the typo in the above would be automatically caught. In the mean time when writing documentation you could use something like

man -s $SECTION $TOPIC| col -b| expand| sed -n -e '%^[ !]*program *demo_%,%^[ !]*end *program *demo_%{p}' >$FILE

which is a line extracted from an automatic tester of syntax and/or whether a demo routine can be compiled and loaded in manpages. If would be easy to change this to handle multiple demos in a single markdown file in the long term.

PS: In the sample program in the document it is OPEN(3f) that should be added to the local scope, not the MEAN(3f) function in stdlib_experimental_io.md.

urbanjost avatar Jan 25 '20 18:01 urbanjost

@urbanjost Thank you for spotting the errors and for the suggestions. I support this idea of an automati tester. Maybe the automatic tester could be also added to CMake and the CI? @certik would it be possible for the CI? I also think it would be good to add a template of a spec. The spec for open was a first attempt. and suggestions as @urbanjost ones can be useful.

jvdp1 avatar Jan 25 '20 19:01 jvdp1

Yes, I would recommend to run documentation tests. We can call them doctests, just like in Python.

LFortran could eventually be used to run those naturally. Until then, I suggest we append end and compile + run manually using gfortran.

certik avatar Jan 25 '20 21:01 certik

In order to have a consistent code style, could we advice to use a formatting tool with the specified standard? For instance, currently I am using fprettify (https://github.com/pseewald/fprettify) with some options (in my case fprettify -w 3 -i 2) wich gives two spaces indentation, but any tool would be useful.

fiolj avatar Jan 28 '20 14:01 fiolj

Yes, the plan is to use a tool and enforce it by the CI. @zbeekman do you still plan to do work on that?

certik avatar Jan 28 '20 16:01 certik

Line length should be chosen IMHO by considering, what is the longest line length you can easily still scan with your eyes. I made good experience with 80 and 100 (preferred) chars per line. For me, everything beyond 100 is hard to read.

For the indentation: I think, it is most consistent to require, that whenever you open a scope / block construct, you increase indentation. That means, all subroutines within the module-construct would be indented. I think, most editors handle that this way anyway, so that would be easiest for the contributors.

Finally, 2 or 4 chars for indentation. If we are really going to use a preprocessor for generating specifics from a template, the according constructs should be also indented to enhance the readability of the generic code (the one most programmers would have to deal with):

#:include "common.fypp"
#:set RANKS = range(1, MAXRANK + 1)

submodule (stdlib_experimental_stats) stdlib_experimental_stats_mean
  use stdlib_experimental_error, only: error_stop
  implicit none

contains

  #:for k1, t1 in REAL_KINDS_TYPES
    #:for rank in RANKS
      module function mean_${rank}$_all_${k1}$_${k1}$(x) result(res)
        ${t1}$, intent(in) :: x${ranksuffix(rank)}$
        ${t1}$ :: res

        res = sum(x) / real(size(x, kind = int64), ${k1}$)

      end function mean_${rank}$_all_${k1}$_${k1}$
    #:endfor
  #:endfor

 [...]

end submodule

versus

#:include "common.fypp"
#:set RANKS = range(1, MAXRANK + 1)


submodule (stdlib_experimental_stats) stdlib_experimental_stats_mean
    use stdlib_experimental_error, only: error_stop
    implicit none

contains

    #:for k1, t1 in REAL_KINDS_TYPES
        #:for rank in RANKS
            module function mean_${rank}$_all_${k1}$_${k1}$(x) result(res)
                ${t1}$, intent(in) :: x${ranksuffix(rank)}$
                ${t1}$ :: res

                res = sum(x) / real(size(x, kind = int64), ${k1}$)

            end function mean_${rank}$_all_${k1}$_${k1}$
        #:endfor
    #:endfor

 [...]

end submodule

So, my 2 cents would be:

  • Indentation by 2 chars, consistently whenever a block construct is opened.
  • Max. line length: 100 chars.

aradi avatar Jan 29 '20 09:01 aradi

I think @zbeekman is busy.

@fiolj or @aradi would you be willing to setup automatic enforcing of the formatting at our CI? That would be super helpful.

certik avatar Jan 29 '20 16:01 certik

I find the proposed 2 space indentation and 100 character line limit very reasonable.

The foreign preprocessing notation in the example is horrifying. There is a template proposal before WG5 that would make this sort of awkwardness unnecessary. In general, a coding style document for Fortran should include a hard prohibition on any sort of preprocessing.

Cheers, Bill

On Jan 29, 2020, at 3:28 AM, Bálint Aradi [email protected] wrote:

Line length should be chosen IMHO by considering, what is the longest line length you can easily still scan with your eyes. I made good experience with 80 and 100 (preferred) chars per line. For me, everything beyond 100 is hard to read.

For the indentation: I think, it is most consistent to require, that whenever you open a scope / block construct, you increase indentation. That means, all subroutines within the module-construct would be indented. I think, most editors handle that this way anyway, so that would be easiest for the contributors.

Finally, 2 or 4 chars for indentation. If we are really going to use a preprocessor for generating specifics from a template, the according constructs should be also indented to enhance the readability of the generic code (the one most programmers would have to deal with):

#:include "common.fypp" #:set RANKS = range(1, MAXRANK + 1)

submodule (stdlib_experimental_stats) stdlib_experimental_stats_mean use stdlib_experimental_error, only: error_stop implicit none

contains

#:for k1, t1 in REAL_KINDS_TYPES #:for rank in RANKS module function mean_${rank}$all${k1}$_${k1}$(x) result(res) ${t1}$, intent(in) :: x${ranksuffix(rank)}$ ${t1}$ :: res

    res = sum(x) / real(size(x, kind = int64), ${k1}$)

  end function mean_${rank}$_all_${k1}$_${k1}$
#:endfor

#:endfor

[...]

end submodule

versus

#:include "common.fypp" #:set RANKS = range(1, MAXRANK + 1)

submodule (stdlib_experimental_stats) stdlib_experimental_stats_mean use stdlib_experimental_error, only: error_stop implicit none

contains

#:for k1, t1 in REAL_KINDS_TYPES
    #:for rank in RANKS
        module function mean_${rank}$_all_${k1}$_${k1}$(x) result(res)
            ${t1}$, intent(in) :: x${ranksuffix(rank)}$
            ${t1}$ :: res

            res = sum(x) / real(size(x, kind = int64), ${k1}$)

        end function mean_${rank}$_all_${k1}$_${k1}$
    #:endfor
#:endfor

[...]

end submodule

So, my 2 cents would be:

• Indentation by 2 chars, consistently whenever a block construct is opened. • Max. line length: 100 chars. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Bill Long [email protected] Principal Engineer, Fortran Technical Support & voice: 651-605-9024 Bioinformatics Software Development fax: 651-605-9143 Cray, a Hewlett Packard Enterprise company/ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425

longb avatar Jan 29 '20 20:01 longb

@longb thanks for the feedback.

Indeed, the pre-processing is not super readable. However, the other alternatives that we considered are even worse. We should strive to include enough features into Fortran so that this kind of pre-processing will become unnecessary, but that's for the long term.

certik avatar Jan 29 '20 22:01 certik

@fiolj or @aradi would you be willing to setup automatic enforcing of the formatting at our CI? That would be super helpful.

I never used nor modified any CI in my projects, and do not know how to work with EditorConfig yet, so if somebody else take this task would be better. Otherwise, I'll start learning...

fiolj avatar Jan 29 '20 22:01 fiolj

The foreign preprocessing notation in the example is horrifying. There is a template proposal before WG5 that would make this sort of awkwardness unnecessary. In general, a coding style document for Fortran should include a hard prohibition on any sort of preprocessing.

@longb I absolutely agree, pre-processing is evil! But it is IMHO even more evil, that we are forced to use pre-processing due to the lack of an appropriate feature in the language.

Would you mind to take a look, as an example, on the implementation of the mean() function in this library, which allows the reduction of the rank of an arbitrary multi-dimensional array by one. Do you see any better solution without pre-processing which avoids duplicating basically the same code 100-times and which is available now (and not in 10-20 years)?

aradi avatar Jan 30 '20 07:01 aradi

@certik I could try to have a look at it (although I am not experienced with the CI in GitHub), but not earlier than end of next week...

aradi avatar Jan 30 '20 08:01 aradi

For cases where the preprocessor is not being used for conditional selection of a programming environment but to expand a template or allow for single-file source with documentation and unit tests and so on the distribution should include the expanded source. Requiring a specific development environment to be available for someone to just build and use the code is very unattractive. I keep virtually all my code in a preprocessor file format and use a very heavily customized and automated build environment but if I put something out for public consumption I put out standard *.f90 *.c ... files and an (automatically generated) makefile with some cpp(1) directives (if conditionals are absolutely necessary). It's one thing to expect a developer to have to have fypp, cmake, doxygen, ... installed, but very disturbing that someone that wants to use the results is required to install a full development environment. Ideally one should have an expanded directory in the distribution that you need nothing but a Fortran compiler to build from.

On another note, since for the integer versions of mean(3f) everything is promoted to a REAL type you could write a single routine with a CLASS(*) argument and a short select type instead of a template which might not be quite as efficient but would be much simpler to maintain. so there are alternatives in some cases to a template.

urbanjost avatar Jan 30 '20 14:01 urbanjost

For cases where the preprocessor is not being used for conditional selection of a programming environment but to expand a template or allow for single-file source with documentation and unit tests and so on the distribution should include the expanded source. Requiring a specific development environment to be available for someone to just build and use the code is very unattractive. I keep virtually all my code in a preprocessor file format and use a very heavily customized and automated build environment but if I put something out for public consumption I put out standard *.f90 *.c ... files and an (automatically generated) makefile with some cpp(1) directives (if conditionals are absolutely necessary). It's one thing to expect a developer to have to have fypp, cmake, doxygen, ... installed, but very disturbing that someone that wants to use the results is required to install a full development environment. Ideally one should have an expanded directory in the distribution that you need nothing but a Fortran compiler to build from.

In addition to have the templated code on Github, it has already been suggested by @certik in #35 comment to provide release tarballs that contain all necessaty generated files to the users. I think it is the way to go (i.e., providing templated codes, and generated tarballs). We just need someone to implement it ;)

On another note, since for the integer versions of mean(3f) everything is promoted to a REAL type you could write a single routine with a CLASS(*) argument and a short select type instead of a template which might not be quite as efficient but would be much simpler to maintain. so there are alternatives in some cases to a template

I think it is easier to maintain if the mean function for integer is (almost) the same as the mean function for real. You only need to think about one structure, but it is maybe only my way of thinking.

jvdp1 avatar Jan 30 '20 15:01 jvdp1

@urbanjost The plan is to "ship" the library with the preprocessed proper Fortran standard files. The repository, however, should contain the sources with the un-preprocessed files. Otherwise, you get crazy with diffs, if you have any changes in those routines. What we could consider is to add the preprocessor (1 single file) to the repository, so having Python (which is quite common) would be enough to build the library. (I have created #133 on this.)

As for class(*): It deserves maybe a separe issue as well. But what you do then, is making the code repetition on a lower lever (inside the subroutine instead of repeating the subroutine). But you still have to make a case for each possible data type there, and do something different (but almost identical) for each rank to be considered as well. So, currently I don't see a way to avoid code repetition, and if we have to repeat code ca. 50-times with minimal changes, I still prefer the ugly preprocessor-trick.

aradi avatar Jan 30 '20 15:01 aradi

I find I have a number of routines where it is desirable or tolerable to always promote to doubleprecision. An elemental routine that takes any scalar and returns a double is called. There are pros and cons but it is very useful in practice for me. For example if you had something like

pure elemental function anyscalar_to_double(valuein) result(d_out)
use, intrinsic :: iso_fortran_env, only : error_unit !! ,input_unit,output_unit
implicit none
!anyscalar_to_double(3f): convert integer or real parameter of any kind to doubleprecision
class(*),intent(in)       :: valuein
doubleprecision           :: d_out
doubleprecision,parameter :: big=huge(0.0d0)
   select type(valuein)
   type is (integer(kind=int8));   d_out=dble(valuein)
   type is (integer(kind=int16));  d_out=dble(valuein)
   type is (integer(kind=int32));  d_out=dble(valuein)
   type is (integer(kind=int64));  d_out=dble(valuein)
   type is (real(kind=real32));    d_out=dble(valuein)
   type is (real(kind=real64));    d_out=dble(valuein)
   type is (real(kind=real128))    d_out=dble(valuein)
   class default
     d_out=nan(d_out)
     !!stop '*M_anything::anyscalar_to_double: unknown type'
   end select
end function anyscalar_to_double

then other routines have an argument of class() and call this routine to convert it to a double. I use templating only where for optimization reasons it is justified. For a public library where anyone might be using it for anything templating is fine in lieu of Fortran features that support something similiar; but class() can be used very effectively and makes the code very easy to maintain.

Not a panacea but very useful for cases it is a fit for.

urbanjost avatar Jan 30 '20 16:01 urbanjost

@urbanjost Thanks for the example! Your approach is quite elegant and much shorter than the preprocessed version, I agree!

One big disadvantage of any class(*) based solution is, however, that it prohibits extensions by the consumers of the library. Assuming a routine, like your anyscalar_to_double() function is part of the library. A consumer of the library would not be able to extend it to additional data types without changing a library itself. If we have routines with specific types hidden behind a generic interface instead, a consumer can add further routines with additional data types using that generic name in his code without having to touch the standard library itself.

aradi avatar Jan 30 '20 16:01 aradi

Thanks @urbanjost for the class example. I like class and use it for some specific things.

However, I would argue against it for the implementation of mean for mainly three reasons (in addition to @aradi 's one):

  • using class, we would still need different functions for the real arrays, because mean must return a result of the same type as the input for real arrays. It is only when integer arrays are provided to mean that dp results are returned. The current behaviour of mean is in agreement with the intrinsic sum function, and I believe it is a nice feature.
  • I think we also need to focus a bit on efficiency inside stdlib; the user does not care about the readibility of the code or how things are implemented in the library. In this case of mean I would say we still must focus on efficiency.
  • the current implementation is probably more portable across compilers than class that may not be supported by all compilers (especially by the older compilers).

Finally I think class could be useful for the subroutines loadtxt and savetxt where preprocessing is also used to repeat the code for different kinds.

jvdp1 avatar Jan 30 '20 18:01 jvdp1

Well, the main argument against implementing mean(x) this way is that the meat of the computation is

sum(x)/real(size(x))

which is AREADY rank and type independent because the involved intrinsics are already generic. Generally, writing a trivial, one-line function is bad programming practice. Just write the line in the spot where the function is referenced. A more sensible example might be helpful.

Cheers, Bill

On Jan 30, 2020, at 12:16 PM, Jeremie Vandenplas [email protected] wrote:

Thanks @urbanjost for the class example. I like class and use it for some specific things.

However, I would argue against it for the implementation of mean for mainly three reasons (in addition to @aradi 's one):

• using class, we would still need different functions for the real arrays, because mean must return a result of the same type as the input for real arrays. It is only when integer arrays are provided to mean that dp results are returned. The current behaviour of mean is in agreement with the intrinsic sum function, and I believe it is a nice feature. • I think we also need to focus a bit on efficiency inside stdlib; the user does not care about the readibility of the code or how things are implemented in the library. In this case of mean I would say we still must focus on efficiency. • the current implementation is probably more portable across compilers than class that may not be supported by all compilers (especially by the older compilers). Finally I think class could be useful for the subroutines loadtxt and savetxt where preprocessing is also used to repeat the code for different kinds.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Bill Long [email protected] Principal Engineer, Fortran Technical Support & voice: 651-605-9024 Bioinformatics Software Development fax: 651-605-9143 Cray, a Hewlett Packard Enterprise company/ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425

longb avatar Jan 30 '20 19:01 longb

Well, the main argument against implementing mean(x) this way is that the meat of the computation is...

@longb I am not sure which way you meant. From your comment, it understand you are against the mean function because it is a simple one. First I implemented it with loops, that have been removed through the review (to help for the clarity of the code mainly (and it was a good thing IMO)).

Otherwise, the main goal is to develop a module with multiple statistical functions (mean, variance, median, ....) inside stdlib. A module about statistics without mean would make little sense to me. Would it?

I think that this submodule could serve as an example for other functions that will be include in stdlib_experimental_stats (it also served as spotting some issues, like how to return NaN), like computing the variance with the Welford 's method:

function variance(x) result(res)
  real, intent(in) :: x(:)
  real :: res

  integer :: i
  real :: M, S, delta

  M = 0.
  S = 0.
  do i = 1, size(x)
   delta = x(i) - M
   M = M + delta/real(i)
   S = S + delta * (x(i)-M)
  enddo
  
  res = S/real(size(x)-1))

end function

Is this a more sensible example, assuming that it would be extended to have an API compatible with sum (i.e., with dim and mask included)?

jvdp1 avatar Jan 30 '20 19:01 jvdp1

Indeed, the only reason we started with mean was because it was the simplest. More complicated functions will follow soon. Finally, yes, it could be easily computed by hand using sum and size, but other libraries have such functions also, such as NumPy's mean or Matlab's mean (for examples in other languages see #113).

certik avatar Jan 30 '20 22:01 certik

Comparison with Python or Matlab does not seem relevant. In both cases, native code is very slow, and they have to resort to libraries for any reasonable performance. Most compiled languages don’t have native support for arrays. Fortran does.

On Jan 30, 2020, at 4:14 PM, Ondřej Čertík [email protected] wrote:

Indeed, the only reason we started with mean was because it was the simplest. More complicated functions will follow soon. Finally, yes, it could be easily computed by hand using sum and size, but other libraries have such functions also, such as NumPy's mean or Matlab's mean (for examples in other languages see #113).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Bill Long [email protected] Principal Engineer, Fortran Technical Support & voice: 651-605-9024 Bioinformatics Software Development fax: 651-605-9143 Cray, a Hewlett Packard Enterprise company/ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425

longb avatar Jan 30 '20 22:01 longb

Interesting example, though the code for Welford’s method does not vectorize and would not perform well on modern processors. I tried a simple example of the direct code and your welford example:

program test_var use,intrinsic :: iso_fortran_env, only: int64, real64 implicit none real :: x(1000), v integer(int64) :: t1,t2,t3 real(real64) :: rate

call random_number(x)

call system_clock(t1, count_rate=rate) v = variance(x) call system_clock(t2) print *, v, (t2-t1)/rate, " sec for Welford method"

call system_clock(t2) v = var(x) call system_clock(t3) print *, v, (t3-t2)/rate, " sec for simple method"

contains

function var(x) result (res) real,intent(in) :: x(:) real :: res

real :: M, S integer :: i

M = sum(x)/real(size(x)) S = sum( (x-m)**2 ) res = S/real(size(x)-1)

end function var

function variance(x) result(res) real, intent(in) :: x(:) real :: res

integer :: i real :: M, S, delta

M = 0. S = 0. do i = 1, size(x) delta = x(i) - M M = M + delta/real(i) S = S + delta * (x(i)-M) enddo

res = S/real(size(x)-1)

end function variance

end program test_var

The output on a generic Intel processor that has SSE registers:

ftn test.f90 ./a.out 8.192101866E-2, 7.15724721261053416E-6 sec for Welford method 8.192101866E-2, 7.80853517877739328E-7 sec for simple method

As you can see, the “simple” method is faster.

Cheers, Bill

On Jan 30, 2020, at 1:53 PM, Jeremie Vandenplas [email protected] wrote:

Well, the main argument against implementing mean(x) this way is that the meat of the computation is...

@longb I am not sure which way you meant. From your comment, it understand you are against the mean function because it is a simple one. First I implemented it with loops, that have been removed through the review (to help for the clarity of the code mainly (and it was a good thing IMO)).

Otherwise, the main goal is to develop a module with multiple statistical functions (mean, variance, median, ....) inside stdlib. A module about statistics without mean would make little sense to me. Would it?

I think that this submodule could serve as an example for other functions that will be include in stdlib_experimental_stats (it also served as spotting some issues, like how to return NaN), like computing the variance with the Welford 's method:

function variance(x ) result(res)

real, intent(in) :: x(:)

real :: res

integer :: i

real :: M, S, delta

M = 0 . S = 0 .

do i = 1, size (x) delta = x(i) - M M = M + delta/real (i) S = S + delta * (x(i)- M)

enddo

res = S/real(size(x)-1 ))

end function Is this a more sensible example, assuming that it would be extended to have an API compatible with sum (i.e., with dim and mask included)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Bill Long [email protected] Principal Engineer, Fortran Technical Support & voice: 651-605-9024 Bioinformatics Software Development fax: 651-605-9143 Cray, a Hewlett Packard Enterprise company/ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425

longb avatar Jan 30 '20 22:01 longb

Comparison with Python or Matlab does not seem relevant. In both cases, native code is very slow, and they have to resort to libraries for any reasonable performance.

They do not do it for performance reasons, but for convenience reasons. If you look at NumPy's implementation:

https://github.com/numpy/numpy/blob/d9b1e32cb8ef90d6b4a47853241db2a28146a57d/numpy/core/_methods.py#L134

this is not more performing than doing it by hand in NumPy. So performance is not the reason why NumPy has the API. Rather, I suspect, the reason is that users like to use such a function.

certik avatar Jan 30 '20 22:01 certik

@longb The comparison to Python and MATLAB is quite relevant because it informs our API design. We share much of the target audience with those and some other languages.

milancurcic avatar Jan 30 '20 22:01 milancurcic

Welford's method takes care of two things. Possible numerical instability; it's a single pass algorithm. It's a choose your poison scenario as with most robust computation algorithms.

Regardless of the examples we use, and implementation we finally decide on for a stdlib that is generally applicable, I also think a lot of people will appreciate a centralized statistics based module. More and more people are using Python and Matlab over Fortran, following their direction at this point is absolutely the way to go, and saves us a lot of effort too!

Discussions about individual algorithms is key to developing the best choice in each case. If the choice of any one algorithm is a bottleneck in a user's code, hopefully they give us some feedback, but they are also free to roll their own version of that algorithm.

leonfoks avatar Jan 30 '20 22:01 leonfoks

If a statistic or any kind of numerical algorithm is to make it in the stdlib, I think it should be highly optimized or not implemented at all (since, as Bill Long said, Fortran has enough intrinsic to not need a function for average). If the algorithm is sub-optimal and not using the best vectorization possible then nobody will use it and it will be dead. The whole point of using Fortran is that here compiler can optimize things but many of these optimizations are not possible if the calculations are closed behind calls to libraries. Simple procedures (such as mean) particularly "feel" that penalty. More complicated ones, not so much. So maybe that could be critera what makes it into stdlib numerical part? Is the procedure simple enough that calling it would introduce a significant penalty? After all, I believe what makes it into stdlib should be in the end considered exemplary and the best way of doing things.

czw., 30 sty 2020 o 23:52 Leon Foks [email protected] napisał(a):

Welford's method takes care of two things. Possible numerical instability; it's a single pass algorithm. It's a choose your poison scenario as with most robust computation algorithms.

Regardless of the examples we use, and implementation we finally decide on for a stdlib that is generally applicable, I also think a lot of people will appreciate a centralized statistics based module. More and more people are using Python and Matlab over Fortran, following their direction at this point is absolutely the way to go, and saves us a lot of effort too!

Discussions about individual algorithms is key to developing the best choice in each case. If the choice of any one algorithm is a bottleneck in a user's code, hopefully they give us some feedback, but they are also free to roll their own version of that algorithm.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AC4NA3LEUVDMBU4WOJ7G47DRANKZXA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKM3Q7Q#issuecomment-580499582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3LB5BZLSAKZTXSQDSLRANKZXANCNFSM4J25FZ6Q .

gronki avatar Jan 31 '20 00:01 gronki

Sorry for the spam, one more thought. There was an argument some time ago along the lines "procedures should not be standarized/implemented if they are simple to write". While I generally disagree with that statement, I think numerical algorithms is where some exception needs to be made, because Fortran excels in numerical computation (unlike in everything else) and its syntax allow writing most things in the most easy and efficient way. So I would dare to argue that numerical part should be the least focus of the stdlib because it is the part that least needs "fixing". What do you think?

pt., 31 sty 2020 o 01:04 Dominik Gronkiewicz [email protected] napisał(a):

If a statistic or any kind of numerical algorithm is to make it in the stdlib, I think it should be highly optimized or not implemented at all (since, as Bill Long said, Fortran has enough intrinsic to not need a function for average). If the algorithm is sub-optimal and not using the best vectorization possible then nobody will use it and it will be dead. The whole point of using Fortran is that here compiler can optimize things but many of these optimizations are not possible if the calculations are closed behind calls to libraries. Simple procedures (such as mean) particularly "feel" that penalty. More complicated ones, not so much. So maybe that could be critera what makes it into stdlib numerical part? Is the procedure simple enough that calling it would introduce a significant penalty? After all, I believe what makes it into stdlib should be in the end considered exemplary and the best way of doing things.

czw., 30 sty 2020 o 23:52 Leon Foks [email protected] napisał(a):

Welford's method takes care of two things. Possible numerical instability; it's a single pass algorithm. It's a choose your poison scenario as with most robust computation algorithms.

Regardless of the examples we use, and implementation we finally decide on for a stdlib that is generally applicable, I also think a lot of people will appreciate a centralized statistics based module. More and more people are using Python and Matlab over Fortran, following their direction at this point is absolutely the way to go, and saves us a lot of effort too!

Discussions about individual algorithms is key to developing the best choice in each case. If the choice of any one algorithm is a bottleneck in a user's code, hopefully they give us some feedback, but they are also free to roll their own version of that algorithm.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AC4NA3LEUVDMBU4WOJ7G47DRANKZXA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKM3Q7Q#issuecomment-580499582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3LB5BZLSAKZTXSQDSLRANKZXANCNFSM4J25FZ6Q .

gronki avatar Jan 31 '20 00:01 gronki

To answer that, we need to have benchmarks.

And as a user, I feel the compilers are not doing their job if they can't inline functions like mean(). That is one of the motivations why I started LFortran, to try to fix such deficiencies. This is long term, obviously.

The idea of all the efforts that we are doing (J3 GitHub, stdlib, fpm, as well as new compilers) is to fix all the things that have been bothering us about Fortran. It's not going to get fixed overnight, but it will get fixed eventually.

On Thu, Jan 30, 2020, at 5:05 PM, Dominik Gronkiewicz wrote:

If a statistic or any kind of numerical algorithm is to make it in the stdlib, I think it should be highly optimized or not implemented at all (since, as Bill Long said, Fortran has enough intrinsic to not need a function for average). If the algorithm is sub-optimal and not using the best vectorization possible then nobody will use it and it will be dead. The whole point of using Fortran is that here compiler can optimize things but many of these optimizations are not possible if the calculations are closed behind calls to libraries. Simple procedures (such as mean) particularly "feel" that penalty. More complicated ones, not so much. So maybe that could be critera what makes it into stdlib numerical part? Is the procedure simple enough that calling it would introduce a significant penalty? After all, I believe what makes it into stdlib should be in the end considered exemplary and the best way of doing things.

czw., 30 sty 2020 o 23:52 Leon Foks [email protected] napisał(a):

Welford's method takes care of two things. Possible numerical instability; it's a single pass algorithm. It's a choose your poison scenario as with most robust computation algorithms.

Regardless of the examples we use, and implementation we finally decide on for a stdlib that is generally applicable, I also think a lot of people will appreciate a centralized statistics based module. More and more people are using Python and Matlab over Fortran, following their direction at this point is absolutely the way to go, and saves us a lot of effort too!

Discussions about individual algorithms is key to developing the best choice in each case. If the choice of any one algorithm is a bottleneck in a user's code, hopefully they give us some feedback, but they are also free to roll their own version of that algorithm.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AC4NA3LEUVDMBU4WOJ7G47DRANKZXA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKM3Q7Q#issuecomment-580499582,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/AC4NA3LB5BZLSAKZTXSQDSLRANKZXANCNFSM4J25FZ6Q

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/3?email_source=notifications&email_token=AAAFAWGSWDMRIH7HII4ZMK3RANTLVA5CNFSM4J25FZ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKNAOYI#issuecomment-580519777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFAWH5IP3UTOPCQDK4XXTRANTLVANCNFSM4J25FZ6Q.

certik avatar Jan 31 '20 00:01 certik

Even simple routines should consider robustness not just speed; and in a perfect world avoid pitfalls users are not always considering, such as data conditioning. Are you more impressed by a compiler whose SUM() function returns the value of OUTLIER() on the first call or one that generates the last value calculated no matter where the OUTLIER() value is placed in the ARR() array?

program testit
   integer,parameter :: elements=10000000
   real    :: arr(elements)
   real    :: summary
   real    :: outlier=huge(summary)/10.0**30
   integer :: i
   arr=1.0
   arr(1)=outlier
   write(*,*)'biggest=',arr(1)

   summary=0.0
   do i=1,elements
      summary=summary+arr(i)
   enddo
   write(*,*)'loop sum biggest first=',summary
   write(*,*)'biggest first sum=',sum(arr)

   summary=0.0
   arr(1)=1.0
   arr(elements)=outlier
   do i=1,elements
      summary=summary+arr(i)
   enddo
   write(*,*)'loop sum biggest last=',summary
   write(*,*)'biggest last sum=',sum(arr)
end program testit

Do you get this: biggest= 340282336.
loop sum biggest first= 340282336.
biggest first sum= 340282336.
loop sum biggest last= 350282336.
biggest last sum= 350282336.

or is "biggest first sum" 350282336. ?

This is an artificial case, but reflects real-world errors that often happen in real-world problems. So I can easily imagine a MEAN() function that is a lot more substantial than just replacing a single line; and a MASK ability is very useful but can run into similar issues like having a mask that says X.ne.REAL_VALUE, for example. Does speed outweigh robustness in stdlib? I have to admit I have seen fewer intrinsic implementations that condition data lately. Is the cost of such checks so high we not want them for the "typical" datasets?

urbanjost avatar Jan 31 '20 02:01 urbanjost

There can be different versions of sum, e.g., Julia has a regular sum and also sum_kbn for a much slower, but more accurate sum algorithm.

(All my production applications do not depend on the order of summation, but for some users that can be helpful, and in that case they can call such different method.)

Anyway, this is not related to the style guide. Please open separate issues for that. And even better, let's start discussing and submitting PRs for all this functionality, so that we can get towards our goal sooner.

certik avatar Jan 31 '20 04:01 certik

As you can see, the “simple” method is faster.

If a statistic or any kind of numerical algorithm is to make it in the stdlib, I think it should be highly optimized or not implemented at all (since, as Bill Long said, Fortran has enough intrinsic to not need a function for average). If the algorithm is sub-optimal and not using the best vectorization possible then nobody will use it and it will be dead.

In a general way, IMO the algorithms must be first robust and accurate (at least as mush as the intrinsic functions; see comment) for a vast majority of the cases available in the community (and not for a personal/specific case). What is the point to have a fast and very efficient algorithm that returns a wrong answer in some general cases? E.g., the "simple" method in comment is faster (while I think most of you could optimize the other one), but in many of my real cases this "faster" function may return a wrong answer. So, there is a trade-off between accuracy/robustness and efficiency, and this may vary depending on the real scenarios. We need to weight advantages and disavantages of both (e.g, I would be happy if the faster method is implemented but the spec should mention its limitations if its robustness is below the one of intrinsic functions). Finally, I think stdlib should try to answer the need of the community that probably includes a large number of "average" Fortran users , and not the need of the current stdlib developers (who have most likely already something like stdlib for their own use). "Average" Fortran users would probably prefer (real case in my field):

integer(int8) :: x(:,:)
print*, mean(x, x < 3) !dp result

over

integer(int8) :: x(:,:)
print*, sum(real(x,dp), x < 3) / real(count(x < 3, kind = int64), dp)

If you are interested in the discussion about descriptive statistics and want to participate to it, please see #113 and #128 . for discussion on a possible trade-off between effeciency and robustness/accuracy, see #134.

I am sorry to have opened a discussion about utility/usefulness/efficiency of specific procedures (mean, variance) (it was not my aim). I hope this issue will restart where it was, i.e., here

jvdp1 avatar Jan 31 '20 08:01 jvdp1

~~Should we start a Wiki page for the style guide?~~ I believe we have already converged to some common practices in the current stdlib modules.

Edit: I forgot we already have the STYLE_GUIDE.md. I will see to create a pull request once I have something.

I've been using the Google C++ Style Guide lately, and found it very helpful to maintain clean code and refresh my memory on best practices. The table of contents is very practical to quickly locate the desired language feature.

A few similar (project-specific) Fortran style guides are:

Other inspiration:

ivan-pi avatar Feb 15 '21 10:02 ivan-pi

Interesting lists. Particularly the CP2K and UK Met sites that actively advocate to avoid F2003 as being “too new”. This points out an important issue with coding style documents. It is needed to incorporate a scheme for constant updating as the underlying language changes. (It is also helpful to understand how CP2K gets to be so bug-ridden.)

Cheers, Bill

On Feb 15, 2021, at 4:50 AM, Ivan Pribec [email protected] wrote:

Should we start a Wiki page for the style guide? I believe we have already converged to some common practices in the current stdlib modules.

I've been using the Google C++ Style Guide lately, and found it very helpful to maintain clean code and refresh my memory on best practices.

A few similar (project-specific) Fortran style guides are:

• @certik 's Fortran Best Practices • DFTB+ Developers Fortran style guide • CP2K Coding Conventions • Object Modeling System Fortran 90/95 Coding Conventions from Colorado State university • Fortran Coding Standards for JULES (UK Met Office) • Fortran coding standard for FCM (UK Met Office) Other inspiration:

• Boost Library Requirements and Guidelines • PEP 8 -- Style Guide for Python Code — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Bill Long [email protected] Engineer/Master , Fortran Technical Support & voice: 651-605-9024 Bioinformatics Software Development fax: 651-605-9143 Hewlett Packard Enterprise/ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425

longb avatar Feb 15 '21 18:02 longb

What's the current consensus on the space in an end block keyword, e.g. enddo or end do?

wyphan avatar Jun 17 '21 22:06 wyphan

I think the consensus is to use end do, and simply use the amount of spaces that is already used in a given file for consistency.

certik avatar Jun 21 '21 11:06 certik

I have just found a few unnecessary semicolons at the end of statements in stdlib_ascii.fypp, for example, https://github.com/fortran-lang/stdlib/blob/22c1be0b8c47c81223c1f84c854cc582b6528a27/src/stdlib_ascii.fypp#L146 which may be a style issue for those who use both Fortran and C-like languages and hopefully be avoided by the use of some linters in future.

tueda avatar Dec 03 '21 13:12 tueda