pelemay icon indicating copy to clipboard operation
pelemay copied to clipboard

Supporting String functions

Open piacerex opened this issue 4 years ago • 15 comments

Is your feature request related to a problem? Please describe.

Currently Pelemay only supports four arithmetic operations and remainder

When using Pelemay in data science, it's necessary to support string manipulation

Describe the solution you'd like

I wish Pelemay to support "String.replace"

Why first is "String.replace"?, because it occupies most of data science

Describe alternatives you've considered No idea

Additional context None

piacerex avatar Sep 26 '19 13:09 piacerex

Would you write a new feature issue according to the issue template?

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

For example,

**Is your feature request related to a problem? Please describe.**

I wish Pelemay to support "String.replace"
Because I'm going to realize "High performance data science" with Elixir
For that purpose, string manipulation is necessary
The first is "String.replace", because it occupies most of data science

**Describe the solution you'd like**

The following code will be able to be compiled:

```elixir
defmodule StringSample do
  require Pelemay
  import Pelemay

  defpelemay do
   def do_test list do
      ...  # Write some code here that uses String.replace
   end
  end
end

Describe alternatives you've considered None.

Additional context None.

zacky1972 avatar Sep 26 '19 19:09 zacky1972

ちなみに,Issue は日本語で書いてもいいです。こちらで英語に翻訳します。

You may write an issue also in Japanese. We'll translate it into in English.

zacky1972 avatar Sep 26 '19 19:09 zacky1972

Oh, It's mistake...

I will fix issue contents later

piacerex avatar Sep 27 '19 01:09 piacerex

@piacerex Would you describe an acceptance testing code at first?

zacky1972 avatar Sep 27 '19 12:09 zacky1972

I guess supporting this feature is a little difficult because it may require some kind of type inference and/or type checking.

I also guess notation for Dialyzer makes it easier.

How about this idea?

zacky1972 avatar Sep 28 '19 20:09 zacky1972

Sounds good, I think first version is fine it

Pelemay's string operation compilation in I expect is follows:

defmodule PelemaySample do
  require Pelemay
  import Pelemay

  defpelemay do
    def map_replace( string_map_list ) do
      string_map_list
      |> Enum.map( & String.replace( &1[ "some_column" ], "hoge", "foo" ) )
    end
  end
end
iex> PelemaySample.map_replace [ %{ "some_column" => "hoge" }, %{ "some_column" => "huge" } ]
[ %{ "some_column" => "foo" }, %{ "some_column" => "huge" } ]

Probably, if Pelemay can operate four arithmetic operations in map list, it will be practical

piacerex avatar Sep 29 '19 08:09 piacerex

I run your code using Enum but I got a different result:

iex(1)> [ %{ "some_column" => "hoge" }, %{ "some_column" => "huge"} ] |> Enum.map( & String.replace( &1[ "some_column" ], "hoge", "foo" ) ) 
["foo", "huge"]

I guess your result must be like this because Enum and Pelemay should be equivalent:

iex> PelemaySample.map_replace [ %{ "some_column" => "hoge" }, %{ "some_column" => "huge" } ]
[ "foo", "huge"  ]

zacky1972 avatar Sep 29 '19 09:09 zacky1972

Ouch, it's easy mistake

Expected is following:

iex> [ %{ "some_column" => "hoge" }, %{ "some_column" => "huge"} ] |> Enum.map( & Map.put( &1, "some_column", String.replace( &1[ "some_column" ], "hoge", "foo" ) ) )
[%{"some_column" => "foo"}, %{"some_column" => "huge"}]

Pelemay version:

iex> PelemaySample.map_list_replace( [ %{ "some_column" => "hoge" }, %{ "some_column" => "huge"} ], "some_column", "hoge", "foo" )
[%{"some_column" => "foo"}, %{"some_column" => "huge"}]

If above case is complex, the first case is following may be better:

iex> "hoge" |> String.replace( "hoge", "foo" )
"foo"
iex> "huge" |> String.replace( "hoge", "foo" )
"huge"

Pelemay version:

iex> PelemaySample.string_replace( "hoge", "hoge", "foo" )
"foo"
iex> PelemaySample.string_replace( "huge", "hoge", "foo" )
"huge"

piacerex avatar Sep 29 '19 10:09 piacerex

@josevalim said:

Unfortunately relying on typespecs for type checking is not a good idea because they are not always verified, only if they use dialyzer. There are people experimenting with a proper type system, but it is a couple years ahead still. :(

I think have your own string_replace may be the best approach for now and you check if the argument is indeed a string in the C side of things.

zacky1972 avatar Sep 29 '19 13:09 zacky1972

I was discussing it with @piacerex on Slack.

@piacerex suggests that:

  1. He don't use typespecs usually because it requires too much cost though it brings a few advantages.
  2. However, he agree typespecs may bring some advantages to solve this issue.
  3. He believes using typespecs to solve this issue is only a temporary solution though it may be realized rapidly.
  4. Type inference will require too much cost to implement though it may be less effective to improve performance.

zacky1972 avatar Sep 29 '19 14:09 zacky1972

I prefer implementing type inference to typespecs because the former has more potential. However, I agree it may require too much cost to implement even if using existing type inference systems such as Haskell or ML.

Thus, I decided to adopt simple type checking at runtime written by the C language to solve this issue, as @josevalim proposed.

Though it may have less performance, I believe it brings more interesting research topics for me.

zacky1972 avatar Sep 29 '19 15:09 zacky1972

I decided to implement some functions of String module, not only String.replace, because a project that our team conduct requires them.

At least, they includes the following functions:

  • String.replace/4

  • String.to_integer/1

  • String.to_integer/2

  • String.to_float/1

  • String.to_atom/1

  • String.split/1

  • Atom.to_string/1

  • Integer.to_string/1

  • Integer.to_string/2

  • Float.to_string/1

zacky1972 avatar Oct 29 '19 02:10 zacky1972

I guess to implement String.replace/4 using SIMD instructions will be as follows:

  1. Allocate a memory area for replaced string because memory area in Elixir should be immutable.
  • Keep the same size to the original memory area if the length of the pattern is larger than that of the replacement.
  • Otherwise, keep some larger size than the original memory area.
  1. Copy the original memory area into the replaced string memory area until the pattern is matched to the original memory area or the end of the area is reached, with auto-vectorization and SIMD instructions.
  2. Copy the rest of the original area into the replaced area if the length of the rest is smaller than the pattern with auto-vectorization and SIMD instructions, and Finish.
  3. Match the whole of the pattern and the original area with auto-vectorization and SIMD instructions
  • If matched, confirm and reallocate if necessary the size of the rest of the replaced string area to have an enough capacity, and copy the replacement into the replaced string area with auto-vectorization and SIMD instructions.
  • If not, copy the unmatched original area into the replaced string area with auto-vectorization and SIMD instructions.
  1. Go back 2

zacky1972 avatar Oct 29 '19 03:10 zacky1972

FWIW, String.replace uses binary:match, which uses memchr, which uses SIMD for looking up the replacement character. If we could optimize this further, then it may be worth a contribution to Erlang/OTP?

josevalim avatar Oct 29 '19 19:10 josevalim

Of course! Thank you for your information

zacky1972 avatar Feb 07 '20 06:02 zacky1972