webpage icon indicating copy to clipboard operation
webpage copied to clipboard

Tutorial: how to deal with strings

Open certik opened this issue 5 years ago • 17 comments

I am to this day struggling how to deal with strings in modern Fortran. I would be happy to contribute a tutorial, once I learn what the best practice is.

Function accepting a string

integer function f(s)
character(*), intent(in) :: s
f = len(s)
end function

Note: the first argument in character(...) is len, so the above is equivalent to character(len=*). I think it is ok to not specify len, as things are shorter then.

Subroutine returning a string

subroutine f(s)
character(:), allocatable, intent(out) :: s
s = "Some text"
end subroutine

Note: This automatically allocate the LHS, so s will get allocated to the length of the string, no white space padding.

Question 1

In fpm, the following code:

subroutine cmd_build()
type(string_t), allocatable :: files(:)
character(:), allocatable :: basename, pkg_name, linking
integer :: i, n    
print *, "# Building project"
call list_files("src", files)
linking = ""
do i = 1, size(files)
    if (str_ends_with(files(i)%s, ".f90")) then
        n = len(files(i)%s)
        basename = files(i)%s(1:n-4)
        call run("gfortran -c src/" // basename // ".f90 -o " // basename // ".o")    
        linking = linking // " " // basename // ".o"
    end if    
end do
call run("gfortran -c app/main.f90 -o main.o")
call package_name(pkg_name)
call run("gfortran main.o " // linking // " -o " // pkg_name)
end subroutine

Gives a warning:

# gfortran (for build/gfortran_debug/fpm/fpm.o build/gfortran_debug/fpm/fpm.mod)
src/fpm.f90:163:0:

 linking = ""
 
Warning: ‘.linking’ may be used uninitialized in this function [-Wmaybe-uninitialized]

What am I doing wrong? How do I initialize an empty string?

Question 2

How do you return a string from a function as a return value?


I will probably have more questions. These are the most pressing.

certik avatar Jul 24 '20 15:07 certik

This would be a good tutorial to have as a reference 👍

I cannot find a reference right now, but I believe the maybe-uninitialized error in gfortran occurs spuriously for allocatable strings. I have the same warning in gfortran when I use allocatable strings, but not with ifort or new flang.

LKedward avatar Jul 24 '20 16:07 LKedward

I do exactly the same thing as your examples of function accepting and subroutine returning strings. I imagine this is common use.

Question 1: You're doing nothing wrong. Gfortran is warning about correct Fortran.

Question 2:

module mod_str
contains
  pure function str()
    character(:), allocatable :: str
    str = 'hello'
  end function str
end module mod_str


program test_str
  use mod_str, only: str
  print *, str()
end program test_str

milancurcic avatar Jul 24 '20 16:07 milancurcic

Continuing Milan's example:

program test_str
  use mod_str, only: str
  character(:), allocatable :: my_string
  my_string = str()
end program test_str

My understanding about this usage is that there are two allocation-on-assignments happening: one in the function for the function result; and one for the assignment at program level. So in comparison to a subroutine implementation, functions returning allocatables incur an extra allocation and, in this example, an extra copy during assignment.

LKedward avatar Jul 24 '20 16:07 LKedward

(sorry, closed by mistake)

LKedward avatar Jul 24 '20 16:07 LKedward

@LKedward that is precisely why I asked about this. If that is the case, that seems like a big downside and our string routines in stdlib should return the strings via arguments as subroutines, not as return values from functions.

certik avatar Jul 24 '20 16:07 certik

If that is the case, that seems like a big downside and our string routines in stdlib should return the strings via arguments as subroutines, not as return values from functions.

Yep, I haven't benchmarked it but this is why I generally avoid functions for returning non-scalars. You can use pointers to return allocated arrays from functions more efficiently, but I also avoid using pointers.

NB: Allocation on assignment

Another useful thing to note, which I only learned recently, is that allocation-on-assignment doesn't occur for colon subscripts ((:)).

So this doesn't work:

program test_str
  use mod_str, only: str
  character(:), allocatable :: my_string
  my_string(:) = str()
end program test_str

Based on this, I would consider it good practice to use the colon subscript to explicitly indicate where there is assignment only and to avoid accidental reallocation.

Question: filling a character string

I have my own related question for strings: Is there a one-liner for filling a character(*) with a non-space character(1)? Example case is for filling a string with all zeros.

LKedward avatar Jul 24 '20 16:07 LKedward

My understanding about this usage is that there are two allocation-on-assignments happening: one in the function for the function result; and one for the assignment at program level.

Yes, I think this is true for any function returning anything allocatable. It's especially penalizing for large arrays. Don't do it if you care about high performance.

I have a toy wave physics project that did this for everything, including large arrays. I was optimizing for functional API and UI, although at the time I didn't understand the implications of functions returning allocatable arrays. Later I heard from a person who found the code to do exactly what they needed but it was too inefficient so they rewrote everything to subroutines to make it fast :).

milancurcic avatar Jul 24 '20 17:07 milancurcic

Regarding functions returning allocatable --- is this mandated by the Fortran Standard to allocate twice, or are compilers permitted to make it as efficient as intent(out) for subroutines? (It's just that some or most compilers currently don't optimize it out, but they could in the future.)

certik avatar Jul 24 '20 17:07 certik

Regarding functions returning allocatable --- is this mandated by the Fortran Standard to allocate twice, or are compilers permitted to make it as efficient as intent(out) for subroutines? (It's just that some or most compilers currently don't optimize it out, but they could in the future.)

It would make sense that if the function is able to be inlined, then one allocation could be optimized out, but I'm no expert here.

I think that in general, the function result needs to be a distinct memory location because it may be used subsequently in an expression; i.e. there is a fundamental difference between a function result and a subroutine intent(out) dummy arg - the former is returned by value whereas the latter is essentially a pointer.

Note 1, section 15.6.2.2 from the interpretation doc:

The function result is similar to any other entity (variable or procedure pointer) local to a function sub-
program. Its existence begins when execution of the function is initiated and ends when execution of the
function is terminated. However, because the final value of this entity is used subsequently in the evaluation
of the expression that invoked the function, an implementation might defer releasing the storage occupied
by that entity until after its value has been used in expression evaluation.

LKedward avatar Jul 24 '20 17:07 LKedward

My understanding of the text you posted is that the Standard allows the result of the function to be as efficient as an intent(out) dummy argument if the compiler chooses to do that.

certik avatar Jul 24 '20 18:07 certik

Would such an optimization be prevented by the requirement that the RHS is evaluated before the assignment occurs?

From 10.2.1.3:

The execution of the assignment shall have the same effect as if the evaluation of
expr and the evaluation of all expressions in variable occurred before any portion
of the variable is defined by the assignment.

for

variable = expr

LKedward avatar Jul 24 '20 19:07 LKedward

I don't know. We might need to ask at the committee. My understanding of it is that the key is "shall have the same effect", in other words, it does not actually have to happen that way, only have the same effect. So the question then becomes if double allocation has the same effect as single allocation. For a string, it seems the logic of the code would be the same. For user derived types perhaps the user requires the finalizer to be called twice.

certik avatar Jul 24 '20 20:07 certik

Regarding Question1: Ignore the warning, this is one of the flags, and actually for the same particular use case, that I suppress with -Wno-maybe-uninitialized, and if you recall one of the reasons I raised an issue here fpm. Also, take a look at Steve Kargl's post in our discourse here. Finally, another similar discussion can be found here. Regarding Question2: I personally follow the way presented by @milancurcic :

I do exactly the same thing as your examples of function accepting and subroutine returning strings. I imagine this is common use.

Question 1: You're doing nothing wrong. Gfortran is warning about correct Fortran.

Question 2:

module mod_str
contains
  pure function str()
    character(:), allocatable :: str
    str = 'hello'
  end function str
end module mod_str


program test_str
  use mod_str, only: str
  print *, str()
end program test_str

However, since we are into this discussion, I also have something to add about the behavior of allocatable characters that may be relevant. The following compiles with no warnings or errors but abords at runtime with a segmentation error:

character(len=:),allocatable :: str

subroutine init_string(filename, str)
    character(len=*),intent(in) :: filename
    character(len=:),allocatable, intent(out) :: str
    open(file...)
    read(unit,*)str
    close(file...)
end subroutine init_string

while this is correct:

character(len=:),allocatable :: str

subroutine init_string(filename, str)
    character(len=*),intent(in) :: filename
    character(len=:),allocatable, intent(out) :: str
    character(len=50) :: temp ! 50 is just a random number for demonstration purposes
    open(file...)
    read(unit,*)temp
    str = trim(temp)
    close(file...)
end subroutine init_string

Another interesting behavior is when the allocatable character in the above example is part of a derived type eg:

type  t_gas
    character(len=:),allocatable :: name
    double :: mass
    etc...
end type t_gas

Now assume we defined a type(t_gas)::gas and tried to read gas%name as we did in the first nonworking example then the program runs without any error but in reality name%gas remains uninitialized, you can print it and it just returns blank but NO error!!

smeskos avatar Jul 24 '20 20:07 smeskos

@smeskos I think you cannot read into an allocatable character type. I vaguely remember this being discussed in the standards committee how to improve the standard to allow this. Until then I think it is not allowed.

certik avatar Jul 24 '20 21:07 certik

I've generally just resorted to using a string type for everything, and then for intent(in) arguments just using an interface to allow people to also pass in character literals (or just character variables).

everythingfunctional avatar Jul 24 '20 23:07 everythingfunctional

Question: filling a character string

I have my own related question for strings: Is there a one-liner for filling a character(*) with a non-space character(1)? Example case is for filling a string with all zeros.

character(len=:), allocatable :: s
s = repeat('0',10)
write(*,*) s

will output 0000000000

ivan-pi avatar Jul 25 '20 11:07 ivan-pi

Perfect, thank you @ivan-pi!

LKedward avatar Jul 25 '20 12:07 LKedward