forbear
forbear copied to clipboard
ucs4/ISO10646 characters
Hi Stefano,
Very cool project!
I noticed you were interested in adding a spinner (in the README.md TODO list) and I was thinking it would be very nice to have ISO 10646 character support, then you can have fancy characters in your spinners and progress bars like this:
It's a bit of a pain though, because you may need to detect if ISO 10646 support is available on the "system" (compiler + machine) and then create overloaded interface wrappers that will accept ASCII, and convert to ISO 10646 characters and pass them to the routines. (Or you could just force everyone to pass in ISO 10646 chars but that may not be practical.)
Here is some code to declare an ISO 10646 variable:
integer, parameter :: ucs4 = selected_char_kind('ISO_10646')
character(*, ucs4), intent(out) :: string
See also:
- https://github.com/jacobwilliams/json-fortran/blob/master/src/json_macros.inc
- https://github.com/jacobwilliams/json-fortran/blob/master/src/json_string_utilities.F90#L21-L59
- https://github.com/jacobwilliams/json-fortran/blob/master/src/json_string_utilities.F90#L572-L714
- https://github.com/jacobwilliams/json-fortran/blob/master/src/json_kinds.F90#L1-L36
- https://github.com/jacobwilliams/json-fortran/blob/master/src/json_kinds.F90#L93-L132
- https://github.com/jacobwilliams/json-fortran/blob/master/src/json_value_module.F90
@zbeekman
Zaak, you have anticipated me, I'll ask to you and Jacob how to support non-ascii characters...
Wonderful references!
Cheers
P.S. today I cannot send you the log of the install script of OpenCoarrays: Friday I have a test for a stable position, cross the fingers!
@zbeekman
Zaak, what do you think about a generic interface along the following lines?
module char_module
implicit none
private
public :: ascii
public :: ucs4
public :: echo
integer, parameter :: ascii = selected_char_kind('ascii')
#ifdef UCS4
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
#else
integer, parameter :: ucs4 = selected_char_kind('ascii')
#endif
interface echo
module procedure echo_ascii
#ifdef UCS4
module procedure echo_ucs4
#endif
endinterface echo
contains
subroutine echo_ascii(string)
character(len=*, kind=ascii), intent(in) :: string
print '(A)', 'I am echo_ascii'
print '(A)', string
endsubroutine echo_ascii
subroutine echo_ucs4(string)
character(len=*, kind=ucs4), intent(in) :: string
print '(A)', 'I am echo_ucs4'
print '(A)', string
endsubroutine echo_ucs4
endmodule char_module
program test
use char_module
implicit none
character(len=3, kind=ascii) :: string_ascii
character(len=3, kind=ucs4) :: string_ucs4
string_ascii = 'abc' ; call echo(string_ascii)
string_ucs4 = 'ABC' ; call echo(string_ucs4 )
endprogram test
Upon execution:
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am echo_ascii
abc
I am echo_ascii
ABC
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics -DUCS4
stefano@thor(06:17 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am echo_ascii
abc
I am echo_ucs4
ABC
Do you think such an approach could be viable?
Cheers
@zbeekman
Sorry... the following is more tailored to what I have in mind
module char_module
implicit none
private
public :: ascii
public :: ucs4
public :: ck
public :: convert
integer, parameter :: ascii = selected_char_kind('ascii')
#ifdef UCS4
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
#else
integer, parameter :: ucs4 = selected_char_kind('ascii')
#endif
integer, parameter :: ck = ucs4
interface convert
module procedure convert_from_ascii
#ifdef UCS4
module procedure convert_from_ucs4
#endif
endinterface convert
contains
function convert_from_ascii(string) result(conv)
character(len=*, kind=ascii), intent(in) :: string
character(len=len(string), kind=ck) :: conv
print '(A)', 'I am convert_from_ascii'
conv = string
endfunction convert_from_ascii
function convert_from_ucs4(string) result(conv)
character(len=*, kind=ucs4), intent(in) :: string
character(len=len(string), kind=ck) :: conv
print '(A)', 'I am convert_from_ucs4'
conv = string
endfunction convert_from_ucs4
endmodule char_module
program test
use char_module
implicit none
character(len=3, kind=ascii) :: string_ascii
character(len=3, kind=ucs4) :: string_ucs4
string_ascii = 'abc' ; print '(A)', convert(string_ascii)
string_ucs4 = 'ABC' ; print '(A)', convert(string_ucs4 )
endprogram test
stefano@thor(06:35 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics
stefano@thor(06:36 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am convert_from_ascii
abc
I am convert_from_ascii
ABC
stefano@thor(06:36 AM Thu Jun 08)
~ 21 files, 28Mb
→ gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics -DUCS4
stefano@thor(06:36 AM Thu Jun 08)
~ 21 files, 28Mb
→ a.out
I am convert_from_ascii
abc
I am convert_from_ucs4
ABC
@zbeekman
I realized that for the aim to make forbear ucs4-enabled I have only to catch the characters kind into the initiazialize
method, thus a more specific approach could be:
module char_module
implicit none
private
public :: ascii
public :: ucs4
public :: ck
public :: initialize
integer, parameter :: ascii = selected_char_kind('ascii')
#ifdef UCS4
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
#else
integer, parameter :: ucs4 = selected_char_kind('ascii')
#endif
integer, parameter :: ck = ucs4
contains
subroutine initialize(input, output)
class(*), intent(in) :: input
character(len=*, kind=ck), intent(inout) :: output
select type(input)
type is(character(len=*, kind=ascii))
print '(A)', 'ascii input'
output = input
#ifdef UCS4
type is(character(len=*, kind=ucs4))
print '(A)', 'ucs4 input'
output = input
#endif
class default
error stop 'error: input must be of class character'
endselect
endsubroutine initialize
endmodule char_module
program test
use char_module
implicit none
character(len=3, kind=ascii) :: string_ascii
character(len=3, kind=ucs4) :: string_ucs4
character(len=3, kind=ck) :: string_ck
character(len=3) :: string_nk
character(len=3, kind=ck) :: string
string_ascii = 'abc'
call initialize(input=string_ascii, output=string)
print '(A)', string
string_ucs4 = 'ABC'
call initialize(input=string_ucs4, output=string)
print '(A)', string
string_ck = 'aBc'
call initialize(input=string_ck, output=string)
print '(A)', string
string_nk = 'AbC'
call initialize(input=string_nk, output=string)
print '(A)', string
call initialize(input=1, output=string)
endprogram test
gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics
ascii input
abc
ascii input
ABC
ascii input
aBc
ascii input
AbC
ERROR STOP error: input must be of class character
gfortran -fcheck=all -W ucs4.F90 -std=f2008 -fall-intrinsics -DUCS4
ascii input
abc
ucs4 input
ABC
ucs4 input
aBc
ascii input
AbC
ERROR STOP error: input must be of class character
Cheers
Stefano, this looks perfect! The only reason we bothered with the complicated wrappers, etc. with JSON-Fortran was to try to eliminate redundant code as much as possible, because the library does some heavy text processing, parsing and manipulation.
The tricky part is how to handle user inputs. If at all possible, you should allow arbitrary user string inputs. If the only possible character user inputs are in an initialize
method, then the way your last example is setup should work perfectly.
Also, FYI, I think conversion from ASCII to UCS4/ISO 10646 happens automatically on assignment. I'm not sure if this is part of the standard, or just common practice for compiler vendors. And, obviously, conversion from UCS4/ISO 10646 to ASCII is, in general, not safe since ASCII is a subset of UCS4/ISO 10646. One could create a routine to check if the UCS4 character exists in ASCII and then perform the conversion, throwing an error if the character in question was not in the ASCII set.
Here is a relevant excerpt from MRC:
selected_char_kind (name)
returns the kind value for the character set whose name is given by the character string name, or −1 if it is not supported (or if the name is not recognized). In particular, if name is
DEFAULT
, the result is the kind of the default character type (equal tokind(’A’)
);ASCII
, the result is the kind of the ASCII character type;ISO_10646
, the result is the kind of the ISO/IEC 10646 UCS-4 character type.- Other character set names are processor dependent. The character set name is not case sensitive (lower case is treated as upper case), and any trailing blanks are ignored.
Note that the only character set which is guaranteed to be supported is the default character set; a processor is not required to support ASCII or ISO 10646.
Zaak, thank you very much for your insight, it is very appreciated.
The tricky part is how to handle user inputs. If at all possible, you should allow arbitrary user string inputs. If the only possible character user inputs are in an initialize method, then the way your last example is setup should work perfectly.
Exactly, this is why I end up with the last initialize
toy example.
FYI, I am planning to add a better support for introspective tests about kindness compilers support into FoBiS, see this.
Edit: I just see that you see the FoBiS proposal...
@zbeekman
Dear Zaak, I added support for UCS4 and now forbear provides 40 different spinners. Other will be very easy to add, feel free to suggest new ones.
A taste
I have only one concern for now: I added spinners via a quick and dirty encoding on the sources, namely the forbear.F90 source contains unicode characters... and I think this is illegal, although GNU gfortran does not complain... what do you think?
Cheers
I have only one concern for now: I added spinners via a quick and dirty encoding on the sources, namely the forbear.F90 source contains unicode characters... and I think this is illegal, although GNU gfortran does not complain... what do you think?
I'm guessing it works because your terminal is UTF-8... not 100% sure. I think GFortran has a flag to specify special characters via '\uxxxx'
etc...
Yes, I just check man gfortran
:
-fbackslash
Change the interpretation of backslashes in string literals from a single backslash character to "C-style" escape characters. The following combinations are expanded "\a", "\b", "\f", "\n", "\r", "\t",
* "\v", "\\", and "\0" to the ASCII characters alert, backspace, form feed, newline, carriage return, horizontal tab, vertical tab, backslash, and NUL, respectively. Additionally, "\x"nn, "\u"nnnn and
* "\U"nnnnnnnn (where each n is a hexadecimal digit) are translated into the Unicode characters corresponding to the specified code points. All other combinations of a character preceded by \ are
unexpanded.
This is probably a safer way to do this, but will be a pain to convert... I would assume Intel provides a similar flag, but can't confirm right now what it may be...