atto icon indicating copy to clipboard operation
atto copied to clipboard

add tests for streams

Open tpolecat opened this issue 9 years ago • 9 comments

there are a lot of corner cases depending on chunking of input, so it would be really nice to have fuzz tests for streams

tpolecat avatar Aug 20 '14 22:08 tpolecat

is there a way to parse a character stream? I have a use case where I want to parse a stream of lines that are not succeeded by a line break, but proceeded. There can be significant wait between the individual lines, so I need to parse and process a line before it's terminating new line is sent. Most line based streaming stuff breaks on that unfortunately, so I imagine a Stream[Char] would be the right thing here.

cvogt avatar Sep 16 '15 18:09 cvogt

If I have a Stream[String] with strings of size 1, does atto apply it parser to each one or to the beginning of the whole stream, across strings?

I am playing with writing a tool that parses scalac output, ignores bogus type errors based on heuristics, pretty prints types, etc. Scalac doesn't to \n after it's type errors, but before apparently. Or it's sbt.

cvogt avatar Sep 16 '15 19:09 cvogt

So, yeah if you use the existing process combinator it will feed each string to the parser and emit values as they are complete (saving any remaining input) and either discard errors or halt on error (depending on which combinator you use). It's straightforward to write a custom processor though ... the current approach handles two possible use cases but it may not match what you're doing. If you want to describe it in a bit more detail I can give you a more precise answer.

tpolecat avatar Sep 16 '15 20:09 tpolecat

sbt prints

[error] .......
....
...
       ^

then waits, no \n following the ^. at some later point the next

[error] .......
....
...
       ^

arrives. I need to parse the first [error]......^ section without waiting for a \n following the ^.

cvogt avatar Sep 16 '15 20:09 cvogt

does atto call the parser on each element of the string individually or does it effectively turn the Stream[String] into a Stream[Char] and run the parser on that?

cvogt avatar Sep 16 '15 20:09 cvogt

The parser consumes strings, which it treats logically as chunks of characters but is more efficient. On success there may be leftover input, which the stream processor uses as the initial input for parsing the next chunk.

For example, the result here includes the residual input:

scala> int.sepBy(char('.')).parse("128.42.32.12 woozle")
res2: atto.ParseResult[List[Int]] = Done( woozle,List(128, 42, 32, 12))

tpolecat avatar Sep 16 '15 20:09 tpolecat

I would need something vaguely like this:

scala> val s: Stream[Char] = ...
scala> println( stream.take(20).mkString )
123,33,111242346456
scala> (int ~ ',').parse(s)
ParseResult( Stream(33,111242346456....), "123," )

Parse a single parseable value off the stream of characters, return it and the remainder of the stream

cvogt avatar Sep 16 '15 20:09 cvogt

Doesn't look like atto does that right now.

cvogt avatar Sep 16 '15 20:09 cvogt

Easy enough to hack up. As always it will come down to details.


import atto._, Atto._
import ParseResult._

def chunk[A](chars: Stream[Char], p: Parser[A]): (ParseResult[A], Stream[Char]) = {
  def go(s: Stream[Char], pr: ParseResult[A]): (ParseResult[A], Stream[Char]) =
    pr match {
      case Done(_, _)    => (pr, s)
      case Fail(_, _, _) => (pr, s)
      case Partial(_)    =>
        s match {
          case c #:: cs => go(cs, pr.feed(c.toString))
          case _        => (pr.done, s)
        }
    }
  go(chars, p.parse(""))
}


scala> chunk("123,33,111242346456".toStream, long <~ (char(',') || endOfInput))
res16: (atto.ParseResult[Long], Stream[Char]) = (Done(,123),Stream(3, ?))

You can use this to define a Stream[Char] ~> Stream[A] transform:

def chunks[A](chars: Stream[Char], p: Parser[A]): Stream[A] =
  chunk(chars, p) match {
    case (Done(cs, a), s) => a #:: chunks(cs.toStream ++ s, p)
    case _ => Stream.Empty // or something
  }

scala> chunks("123,33,111242346456".toStream, long <~ (char(',') || endOfInput))
res17: Stream[Long] = Stream(123, ?)

scala> res17.toList
res18: List[Long] = List(123, 33, 111242346456)

tpolecat avatar Sep 16 '15 20:09 tpolecat