stdlib Create a spec document template

Create a spec document template

Open milancurcic opened this issue 2 years ago • 20 comments

Currently we don't have a Markdown template for spec documents. We should have one. Benefits include:

Makes it easier for contributors to write spec docs.
Ensures consistency if followed.

This could be in a file SPEC_TEMPLATE.md or similar, placed in the root directory, and linked to from WORKFLOW.md and CONTRIBUTING.md.

Inspired by @JHenneberg in #503.

Aug 31 '21 16:08 milancurcic

Short summary of the more or less status quo. More or less because it is not consistently used over all documents:

General structure of the spec file
- For now mostly like this:

---
title: <MODULE_NAME>
---

# The `<MODULE_NAME>` module

[TOC]

## Introduction

## <OPTIONAL_SECTIONS>

## Procedures and methods provided

### `<PROCEDURE_NAME>` - <SHORT_DESCRIPTION>

How should procedures be documented:
- For now mostly according to the Fortran Standard ISO 1539-1 2018 was followed and extended by the subsection Status and Syntax but missing the section Result Characteristics:

### `<PROCEDURE_NAME>` - <SHORT_DESCRIPTION>

#### Status

[<STABLE>, <EXPERIMENTAL>]

#### Description

#### Syntax

#### Class

[<IMPURE>, <PURE>, <ELEMENTAL>,]

#### Arguments

#### Result Value

#### Example

Sep 01 '21 08:09 JHenneberg

Should Class be:

#### Class

[<IMPURE>, <PURE>, <ELEMENTAL>,] [<FUNCTION> <SUBROUTINE>]

Sep 01 '21 15:09 milancurcic

@milancurcic I think that would be usefull, even though there is the section #### Syntax it would be redundant.

The Syntax section is not needed in the Fortran Std 2018 files because the complete function/procedure is written in the headline.

Fortran Std or GCC Style:

`ICHAR(C, [, KIND])`

Description

Returns the code for the character in the first character position of C in the system’s native character set.

Class

Elemental function.

Arguments

C : shall be of type character and of length one. Its value shall be that of a character capable of representation in the processor.
KIND (optional): shall be a scalar integer constant expression.

Result Characteristics

Integer. If KIND is present, the kind type parameter is that specified by the value of KIND; otherwise, the kind type parameter is that of default integer type.

Result Value

The result is the position of C in the processor collating sequence associated with the kind type ...

Example

ICHAR (’X’) has the value 88 on a processor using the ASCII collating sequence for default characters.

vs. fortran_stdlib or GCC ICHAR:

`ICHAR` - Code value for character

Status

Stable

Syntax

RESULT = ICHAR(C [, KIND])

Description

Returns the code for the character in the first character position of C in the system’s native character set.

Class

Elemental

Arguments

C : shall be of type character and of length one. Its value shall be that of a character capable of representation in the processor.
KIND (optional): shall be a scalar integer constant expression.

Result Value

Integer. If KIND is present, the kind type parameter is that specified by the value of KIND; otherwise, the kind type parameter is that of default integer type.

The result is the position of C in the processor collating sequence associated with the kind type ...

Example

program read_val
  integer value
  character(len=10) string, string2
  string = '154'
  
  ! Convert a string to a numeric value
  read (string,'(I10)') value
  print *, value
  
  ! Convert a value to a formatted string
  write (string2,'(I10)') value
  print *, string2
end program read_val

So I guess it is mostly a discussion, which format is prefered and what should be added like @milancurcic suggested.

Sep 02 '21 07:09 JHenneberg

How about we add 2 small sections in documentation with the name troubleshooting and caution? We know not everyone has the patience to go through the whole documentation only to look for the solution to the problems they are facing with a particular function. If we add a small troubleshooting section and caution section (they are not needed for all functions though but for can be quite useful for some) it will make documentation way easier to reference to.

Also, sometimes Fortan throws bad errors, so the documentation can provide some possible ways in which a user can solve those errors. It will be like a small stackoverflow in itself.

Also, at places where a feature is quite large people divide documentation as per the use case like basic and advanced assuming that not everyone importing the feature is going to be a user looking for advanced usages.

Sep 07 '21 14:09 aman-godara

Also, a contributor might not be aware of markdown which GitHub uses this too can be added.

in GitHub Flavoured Markdown: Double space + Enter means new line. Double Enter also means new line. space + Enter DOESN'T mean new line and means a single space only. Single Enter means a single space more than one space means a single space only.

Different websites support different markdown but mostly there is consensus.

These websites can help: https://commonmark.org/help/tutorial/ https://guides.github.com/features/mastering-markdown/ https://www.markdownguide.org/basic-syntax/

Sep 07 '21 14:09 aman-godara

I'd like to simplify and condense the spec as much as possible so the user doesn't have to scroll down a lot to get the full picture.

What do you all think about having a "Signature" section? For example for linspace, it would look something like this:

`linspace`

Description of linspace.

Signature

function linspace(start, end) result(res)
    real, intent(in) :: start
    real, intent(in) :: end
    real :: res(DEFAULT_LINSPACE_LENGTH)

function linspace(start, end, n) result(res)
    real, intent(in) :: start
    real, intent(in) :: end
    integer, intent(in) :: n
    real :: res(max(n, 0))

IMO there are a few advantages to this approach:

You can see in one focused place all the arguments and types
In the description of arguments, you don't need to write the types, ranks, and intent of arguments, which IMO is boilerplate when used in prose. Instead, you just focus on the meaning of arguments and results, and the types, ranks, and intent are apparent from the signature.
We can remove the "Syntax" and "Class" sections. Both will be apparent from the signature. Class especially always seemed unnecessarily bureaucratic to me.

Then of course we have a question of how to represent signatures of generic procedures. For linspace, perhaps something like this:

function linspace(start, end) result(res)
    {real, integer}, intent(in) :: start
    real, intent(in) :: end
    real :: res(DEFAULT_LINSPACE_LENGTH)

Where curly braces include all compatible types. Or, just spell out all specific signatures.

Doing so for all type kinds would probably be overkill and flood the spec page, so perhaps we can simply have a sentence at the bottom of the signature section, saying something like:

All real arguments are compatible with sp, dp, and qp type kinds. All integer arguments are compatible with int8, int16, int32, and int64 arguments.

If you like this idea, let me know and I can prepare an alternative spec doc for one procedure so we can compare them side by side.

Sep 18 '21 20:09 milancurcic

@milancurcic Yes, to me this sounds like an improvement.

Sep 19 '21 00:09 gareth-nx

@milancurcic : I agree. Especially about the generic routines. Your idea could be further extended. Instead of a text describing the KIND of datatype the signature could be shortened even more:

function linspace(start, end) result(res)
    {real( {sp, dp, qp} ), integer( {int8, int16} ), intent(in) :: start
    real( {sp, dp, qp} ), intent(in) :: end
    real :: res(DEFAULT_LINSPACE_LENGTH)

Negative effect would be of course that the KIND needs to be added to every parameter.

Sep 20 '21 07:09 JHenneberg

I put together a proposal template that could be used as a mold for all spec docs: https://github.com/milancurcic/stdlib-spec-template, with a few changes from the current specs, as described in this thread:

Provide interfaces
Remove "Syntax", "Class", and "Status" sections. The former two are redundant with the interface. I will explain about removing "Status" in a bit.
"Arguments" section now doesn't list all the boilerplate that's already in the interface, but only describes the meaning.

To illustrate, I made an example spec following the proposed template. Compare it with the current version that we have.

Rationale on removing "Status" from the spec pages: While everything in stdlib is experimental, and it will be for the foreseeable future, we should simply state upfront in the README.md that everything is experimental, and unclutter the spec docs with unnecessary boilerplate. When we are ready to mark some procedures as stable, we should create a Status page where all the modules, procedures, and derived types will be listed with their statuses. But until then, having every single procedure spell out Status: Experimental on it doesn't communicate anything useful.

In summary, my motivation for this is to:

Have a template that we can base current and future specs on and stay consistent, as proposed by @JHenneberg
Significantly clean up the current specs and make them easier to read and use

Oct 05 '21 04:10 milancurcic

Thank you all for this work. The first specs were based on the document of the Fortran standard. The "class" section is required in the Fortran Standard, because the interface is not fully provided (e.g., for findloc, the title of the section only includes 16.9.78 FINDLOC(ARRAY, VALUE, DIM [,....]). The Interface section, as provided in your example includes the type of procedure.

Also, I am not sure it is a good idea to list all kinds of integer and real in the interface. A first reason is that these lists of kinds of integer and real may change, depending on the compiler/platform used (e.g., if qp is not supported by the platform, then stdlib should be compiled without supporting qp (see all discussions about portability and data-mining iso_fortran_env). A second reason is that, if a new kind is introduced in the standard, and supported by stdlib, it would mean that all specs should be revised. Therefore, to avoid issue, I would suggest to only mention, e.g.:

array: Shall be of type real (which cover all real kinds supported by the platform and stdlib; as in the Standard)
To numerate all supported kinds, only if not all kinds provided in stdlib_kinds or iso_fortran_env are not supported (for some specific reasons)

Oct 05 '21 06:10 jvdp1

Thanks, @jvdp1.

Regarding the class, in the first para of your post, are you suggesting that we keep it or merely explaining why we have it? I understand that the first specs followed the standard format. I suggest that we actually not follow the standard, but start from scratch and come up with a format that's most clear to us and conveys the info we need.

Regarding the type kinds, I agree with your points, but it seems to me like an argument to not do something useful because it's not perfect. Is there some way that we can indicate that not all kinds may be available on the user's system? This could be a sentence in the README or the docs top-level page a-la:

Stdlib attempts to provide all standard kinds from the iso_fortran_env module, however some may not be provided by your compiler. Procedure interfaces that rely on the unavailable type kinds will not be provided, despite being listed in the specification documents.

I think it's quite important that we document the type kinds because some functions have both arguments that use both all kinds and only default kind (for example linspace and logspace accept integers of all kinds for start and end, but only default integer for the number of elements n). A user that attempts to pass an unsupported kind argument will simply get a "could not resolve generic interface" kind of error message, and will need to look in the source code to find out what's wrong.

I also recommend that we use plain English and do away with the standardese "shall be" and similar.

Oct 05 '21 13:10 milancurcic

Regarding the class, in the first para of your post, are you suggesting that we keep it or merely explaining why we have it? I understand that the first specs followed the standard format. I suggest that we actually not follow the standard, but start from scratch and come up with a format that's most clear to us and conveys the info we need.

I was trying to explain why we had the class section, and why it is not needed with your proposed template. So, if we go for your template, the class section should not be in it IMO.

Regarding the type kinds, I agree with your points, but it seems to me like an argument to not do something useful because it's not perfect. Is there some way that we can indicate that not all kinds may be available on the user's system? This could be a sentence in the README or the docs top-level page a-la:

Stdlib attempts to provide all standard kinds from the iso_fortran_env module, however some may not be provided by your compiler. Procedure interfaces that rely on the unavailable type kinds will not be provided, despite being listed in the specification documents.

If we mention something like that, then stdlib should support all possible kinds, which is far to be the case now. Also, if I am right, there are only 2 reals defined by the standard: kind(1.) and kind(1.d0) . And what about 80-bit reals? and new kinds of real that could be supported in the future? Would we need to revise all the specs?

Alternatively, when the issue of data mining iso_fortran_env will be solved, the README could mention something like:

Stdlib attempts to provide all kinds from the `iso_fortran_env` module, except if mentioned otherwise.

The description of the interface would then be simplified.

I think it's quite important that we document the type kinds because some functions have both arguments that use both all kinds and only default kind (for example linspace and logspace accept integers of all kinds for start and end, but only default integer for the number of elements n). A user that attempts to pass an unsupported kind argument will simply get a "could not resolve generic interface" kind of error message, and will need to look in the source code to find out what's wrong.

I understand this issue. Would something like real() :: array(:) for all kinds of supported real, and real :: array(:) for the default real be acceptable?

I also recommend that we use plain English and do away with the standardese "shall be" and similar.

I agree with this comment.

Oct 05 '21 20:10 jvdp1

Okay, let's discuss the syntax in a concrete example. For linspace the currently proposed syntax is (I put start and end on the same line for brevity):

function linspace(start, end, n) result(res)
  {integer{int8, int16, int32, int64}, real{sp, dp, qp}, complex{sp, dp, qp}}, intent(in) :: start, end
  integer, intent(in) :: n
  {real{sp, dp, qp}, complex{sp, dp, qp}} :: res(n)

If I understand it correctly, you propose:

function linspace(start, end, n) result(res)
  {integer(), real(), complex()}, intent(in) :: start, end
  integer, intent(in) :: n
  {real(), complex()} :: res(n)

It's cleaner but to me it looks a lot like function calls. What do you think about this:

function linspace(start, end, n) result(res)
  {integer(*), real(*), complex(*)}, intent(in) :: start, end
  integer, intent(in) :: n
  {real(*), complex(*)} :: res(n)

Because * is commonly used to mean "everything".

Oct 05 '21 21:10 milancurcic

It's cleaner but to me it looks a lot like function calls. What do you think about this:
function linspace(start, end, n) result(res)
  {integer(*), real(*), complex(*)}, intent(in) :: start, end
  integer, intent(in) :: n
  {real(*), complex(*)} :: res(n)
Because * is commonly used to mean "everything".

I agree with you: real(), complex() look like a lot to function calls. Among the three, it is the one with the * that I prefer.

Oct 06 '21 05:10 jvdp1

function linspace(start, end, n) result(res)
  {integer(*), real(*), complex(*)}, intent(in) :: start, end
  integer, intent(in) :: n
  {real(*), complex(*)} :: res(n)

@JHenneberg is this syntax acceptable to you? If yes, and if there's no opposition, the next steps would be to:

Incorporate the template from milancurcic/stdlib-spec-template into #517 and revise it as needed.
Once merged, we'd require every new module to follow the template
Incrementally revise the existing specs in separate PRs.

Oct 06 '21 23:10 milancurcic

@milancurcic Seems reasonable.

Oct 07 '21 08:10 JHenneberg

I update #517 , but I do not really like that there are three SPEC_*.md document on the top level, but creating an extra folder oder pushing the content in one file also does not sound right. Any ideas?

Oct 14 '21 07:10 JHenneberg

I do not really like that there are three SPEC_*.md document on the top level,

I am not sure to understand what you mean with "three SPEC_*.md"

Oct 14 '21 07:10 jvdp1

@jvdp1 SPEC_EXAMPLE.md, SPEC_GUIDE.md and SPEC_TEMPLATE.md from #517

Oct 14 '21 07:10 JHenneberg

We discussed this on the call today. There is support for the general idea and direction. Takeaways:

Showing interfaces is useful, though if there are too many it may make it worse than the current solution, for example for stdlib_stats moment function which has 10 distinct interfaces; There may be a middle-ground solution to find.
A template is desired to streamline writing of specs
Aim to not duplicate information between the specs and FORD-generated API docs.

Course of action: Implement a template for the current specification format, in other words, don't introduce changes yet. When we have a template, we can revisit and see if and what changes are needed.

Dec 14 '21 21:12 milancurcic

stdlib stdlib copied to clipboard

Create a spec document template

ICHAR(C, [, KIND])

Description

Class

Arguments

Result Characteristics

Result Value

Example

ICHAR - Code value for character

Status

Syntax

Description

Class

Arguments

Result Value

Example

linspace

Signature

stdlib
stdlib copied to clipboard

`ICHAR(C, [, KIND])`

`ICHAR` - Code value for character

`linspace`