Tutorial: how to deal with strings
I am to this day struggling how to deal with strings in modern Fortran. I would be happy to contribute a tutorial, once I learn what the best practice is.
Function accepting a string
integer function f(s)
character(*), intent(in) :: s
f = len(s)
end function
Note: the first argument in character(...) is len, so the above is equivalent to character(len=*). I think it is ok to not specify len, as things are shorter then.
Subroutine returning a string
subroutine f(s)
character(:), allocatable, intent(out) :: s
s = "Some text"
end subroutine
Note: This automatically allocate the LHS, so s will get allocated to the length of the string, no white space padding.
Question 1
In fpm, the following code:
subroutine cmd_build()
type(string_t), allocatable :: files(:)
character(:), allocatable :: basename, pkg_name, linking
integer :: i, n
print *, "# Building project"
call list_files("src", files)
linking = ""
do i = 1, size(files)
if (str_ends_with(files(i)%s, ".f90")) then
n = len(files(i)%s)
basename = files(i)%s(1:n-4)
call run("gfortran -c src/" // basename // ".f90 -o " // basename // ".o")
linking = linking // " " // basename // ".o"
end if
end do
call run("gfortran -c app/main.f90 -o main.o")
call package_name(pkg_name)
call run("gfortran main.o " // linking // " -o " // pkg_name)
end subroutine
Gives a warning:
# gfortran (for build/gfortran_debug/fpm/fpm.o build/gfortran_debug/fpm/fpm.mod)
src/fpm.f90:163:0:
linking = ""
Warning: ‘.linking’ may be used uninitialized in this function [-Wmaybe-uninitialized]
What am I doing wrong? How do I initialize an empty string?
Question 2
How do you return a string from a function as a return value?
I will probably have more questions. These are the most pressing.
This would be a good tutorial to have as a reference 👍
I cannot find a reference right now, but I believe the maybe-uninitialized error in gfortran occurs spuriously for allocatable strings. I have the same warning in gfortran when I use allocatable strings, but not with ifort or new flang.
I do exactly the same thing as your examples of function accepting and subroutine returning strings. I imagine this is common use.
Question 1: You're doing nothing wrong. Gfortran is warning about correct Fortran.
Question 2:
module mod_str
contains
pure function str()
character(:), allocatable :: str
str = 'hello'
end function str
end module mod_str
program test_str
use mod_str, only: str
print *, str()
end program test_str
Continuing Milan's example:
program test_str
use mod_str, only: str
character(:), allocatable :: my_string
my_string = str()
end program test_str
My understanding about this usage is that there are two allocation-on-assignments happening: one in the function for the function result; and one for the assignment at program level. So in comparison to a subroutine implementation, functions returning allocatables incur an extra allocation and, in this example, an extra copy during assignment.
(sorry, closed by mistake)
@LKedward that is precisely why I asked about this. If that is the case, that seems like a big downside and our string routines in stdlib should return the strings via arguments as subroutines, not as return values from functions.
If that is the case, that seems like a big downside and our string routines in
stdlibshould return the strings via arguments as subroutines, not as return values from functions.
Yep, I haven't benchmarked it but this is why I generally avoid functions for returning non-scalars. You can use pointers to return allocated arrays from functions more efficiently, but I also avoid using pointers.
NB: Allocation on assignment
Another useful thing to note, which I only learned recently, is that allocation-on-assignment doesn't occur for colon subscripts ((:)).
So this doesn't work:
program test_str
use mod_str, only: str
character(:), allocatable :: my_string
my_string(:) = str()
end program test_str
Based on this, I would consider it good practice to use the colon subscript to explicitly indicate where there is assignment only and to avoid accidental reallocation.
Question: filling a character string
I have my own related question for strings: Is there a one-liner for filling a character(*) with a non-space character(1)? Example case is for filling a string with all zeros.
My understanding about this usage is that there are two allocation-on-assignments happening: one in the function for the function result; and one for the assignment at program level.
Yes, I think this is true for any function returning anything allocatable. It's especially penalizing for large arrays. Don't do it if you care about high performance.
I have a toy wave physics project that did this for everything, including large arrays. I was optimizing for functional API and UI, although at the time I didn't understand the implications of functions returning allocatable arrays. Later I heard from a person who found the code to do exactly what they needed but it was too inefficient so they rewrote everything to subroutines to make it fast :).
Regarding functions returning allocatable --- is this mandated by the Fortran Standard to allocate twice, or are compilers permitted to make it as efficient as intent(out) for subroutines? (It's just that some or most compilers currently don't optimize it out, but they could in the future.)
Regarding functions returning allocatable --- is this mandated by the Fortran Standard to allocate twice, or are compilers permitted to make it as efficient as
intent(out)for subroutines? (It's just that some or most compilers currently don't optimize it out, but they could in the future.)
It would make sense that if the function is able to be inlined, then one allocation could be optimized out, but I'm no expert here.
I think that in general, the function result needs to be a distinct memory location because it may be used subsequently in an expression; i.e. there is a fundamental difference between a function result and a subroutine intent(out) dummy arg - the former is returned by value whereas the latter is essentially a pointer.
Note 1, section 15.6.2.2 from the interpretation doc:
The function result is similar to any other entity (variable or procedure pointer) local to a function sub-
program. Its existence begins when execution of the function is initiated and ends when execution of the
function is terminated. However, because the final value of this entity is used subsequently in the evaluation
of the expression that invoked the function, an implementation might defer releasing the storage occupied
by that entity until after its value has been used in expression evaluation.
My understanding of the text you posted is that the Standard allows the result of the function to be as efficient as an intent(out) dummy argument if the compiler chooses to do that.
Would such an optimization be prevented by the requirement that the RHS is evaluated before the assignment occurs?
From 10.2.1.3:
The execution of the assignment shall have the same effect as if the evaluation of
expr and the evaluation of all expressions in variable occurred before any portion
of the variable is defined by the assignment.
for
variable = expr
I don't know. We might need to ask at the committee. My understanding of it is that the key is "shall have the same effect", in other words, it does not actually have to happen that way, only have the same effect. So the question then becomes if double allocation has the same effect as single allocation. For a string, it seems the logic of the code would be the same. For user derived types perhaps the user requires the finalizer to be called twice.
Regarding Question1: Ignore the warning, this is one of the flags, and actually for the same particular use case, that I suppress with -Wno-maybe-uninitialized, and if you recall one of the reasons I raised an issue here fpm. Also, take a look at Steve Kargl's post in our discourse here. Finally, another similar discussion can be found here. Regarding Question2: I personally follow the way presented by @milancurcic :
I do exactly the same thing as your examples of function accepting and subroutine returning strings. I imagine this is common use.
Question 1: You're doing nothing wrong. Gfortran is warning about correct Fortran.
Question 2:
module mod_str contains pure function str() character(:), allocatable :: str str = 'hello' end function str end module mod_str program test_str use mod_str, only: str print *, str() end program test_str
However, since we are into this discussion, I also have something to add about the behavior of allocatable characters that may be relevant. The following compiles with no warnings or errors but abords at runtime with a segmentation error:
character(len=:),allocatable :: str
subroutine init_string(filename, str)
character(len=*),intent(in) :: filename
character(len=:),allocatable, intent(out) :: str
open(file...)
read(unit,*)str
close(file...)
end subroutine init_string
while this is correct:
character(len=:),allocatable :: str
subroutine init_string(filename, str)
character(len=*),intent(in) :: filename
character(len=:),allocatable, intent(out) :: str
character(len=50) :: temp ! 50 is just a random number for demonstration purposes
open(file...)
read(unit,*)temp
str = trim(temp)
close(file...)
end subroutine init_string
Another interesting behavior is when the allocatable character in the above example is part of a derived type eg:
type t_gas
character(len=:),allocatable :: name
double :: mass
etc...
end type t_gas
Now assume we defined a type(t_gas)::gas and tried to read gas%name as we did in the first nonworking example then the program runs without any error but in reality name%gas remains uninitialized, you can print it and it just returns blank but NO error!!
@smeskos I think you cannot read into an allocatable character type. I vaguely remember this being discussed in the standards committee how to improve the standard to allow this. Until then I think it is not allowed.
I've generally just resorted to using a string type for everything, and then for intent(in) arguments just using an interface to allow people to also pass in character literals (or just character variables).
Question: filling a character string
I have my own related question for strings: Is there a one-liner for filling a character(*) with a non-space character(1)? Example case is for filling a string with all zeros.
character(len=:), allocatable :: s
s = repeat('0',10)
write(*,*) s
will output 0000000000
Perfect, thank you @ivan-pi!