streamly icon indicating copy to clipboard operation
streamly copied to clipboard

Inefficient code when using unfolds vs direct stream implementation

Open harendra-kumar opened this issue 1 year ago • 1 comments

See #1709 .

1709 is about some multi-stream operations not fusing. Some other multi-stream operations like eqBy do fuse for unfolds but are CPU inefficient compared to the case when using the direct stream implementation of unfoldrM, because of the code being generated in a different way. The unfold case for eqBy gives the following core:

main_$seq_loop1
  = \ sc_s43g sc1_s43f sc2_s43e sc3_s43d ->
      case ># sc1_s43f 100001# of {
        __DEFAULT ->
          case ==# sc3_s43d sc1_s43f of {
            __DEFAULT ->
              ((hPutStr' stdout $fShowBool4 True) `cast` <Co:2>) sc_s43g;
            1# ->
              case ># sc2_s43e 100001# of {
                __DEFAULT ->
                  main_$seq_loop1 sc_s43g (+# sc1_s43f 1#) (+# sc2_s43e 1#) sc2_s43e;
                1# ->
                  case ># (+# sc1_s43f 1#) 100001# of {
                    __DEFAULT ->
                      ((hPutStr' stdout $fShowBool4 True) `cast` <Co:2>) sc_s43g;
                    1# -> ((hPutStr' stdout $fShowBool2 True) `cast` <Co:2>) sc_s43g
                  }
              }
          };
        1# -> ((hPutStr' stdout $fShowBool4 True) `cast` <Co:2>) sc_s43g
      }

The stream case:

main_$s$weq_loop0
  = \ sc_s43S sc1_s43R sc2_s43Q ->
      case ># sc2_s43Q 100001# of {
        __DEFAULT ->
          case ># sc1_s43R 100001# of {
            __DEFAULT ->
              case ==# sc2_s43Q sc1_s43R of {
                __DEFAULT -> exit_r45W sc_s43S;
                1# -> main_$s$weq_loop0 sc_s43S (+# sc1_s43R 1#) (+# sc2_s43Q 1#)
              };
            1# -> exit_r45W sc_s43S
          };
        1# ->
          case ># sc1_s43R 100001# of {
            __DEFAULT ->
              ((hPutStr' stdout $fShowBool4 True) `cast` <Co:2>) sc_s43S;
            1# -> ((hPutStr' stdout $fShowBool2 True) `cast` <Co:2>) sc_s43S
          }
      }

The following operations are affected by this:

o-1-space.multi-stream-pure.eqBy                                        99.41                               +126.32
o-1-space.multi-stream-pure.==                                          99.38                               +125.79
o-1-space.multi-stream-pure./=                                          99.39                               +125.40
o-1-space.multi-stream-pure.<                                          111.75                               +106.31
o-1-space.multi-stream-pure.cmpBy                                      111.79                               +106.07

harendra-kumar avatar Jul 15 '22 12:07 harendra-kumar

Other than the ordering of conditions, the only significant difference in these two seems to be an additional parameter in the recursive call:

stream case:

                main_$s$weq_loop0 sc_s43S (+# sc1_s43R 1#) (+# sc2_s43Q 1#)

unfold case:

                  main_$seq_loop1 sc_s43g (+# sc1_s43f 1#) (+# sc2_s43e 1#) sc2_s43e;

harendra-kumar avatar Jul 15 '22 13:07 harendra-kumar