Optional type as return value of parser calls
We call parsers in succession, and a single parser failing should not, at least for most errors, lead to other parsers failing as well. We currently capture a tiny subset of errors in the .parse method of the FormatParser module, but there is no guarantee these are the only errors a parser could raise.
I think what we could use something like this instead:
result_nil_or_error = ParserOptional.(parser, input)
case result_nil_or_error
when Exception #=> register an exception and continue
when nil # => parser didn't do anything
else # => We did recover a meaningful result
end
For this we only have to introduce one wrapper method, and we don't have to change the parsers at all.
@julik I would like to implement this.
Maybe I could also implement also some small other changes in this scope for the parse method. For example the results argument is reused as a variable storing the results of the parser calls. It is a bit confusing...
https://github.com/WeTransfer/format_parser/blob/master/lib/format_parser.rb#L80
Also the results.first is enough to break out of the map method, so setting the amount variable and using take is not necessary. At least according to my quick tests.
In regards with the exception handling I would just catch every StandardError. Catching any other exception I consider a bad idea.
Any suggestions on what does registering an exception mean? Does it make sense returning them to the caller?
So here is a code snippet. Here I just return the errors in a hash if the strategy is all_results.
STRATEGIES = %i[all_results first_result]
def self.parse(io, natures: @parsers_per_nature.keys, formats: @parsers_per_format.keys, strategy: :all_results)
# Check for strategy
unless STRATEGIES.include? strategy
throw ArgumentError.new(
"Uknown strategy #{strategy}. Supported strategies: #{STRATEGIES}."
)
end
# If the cache is preconfigured do not apply an extra layer. It is going
# to be preconfigured when using parse_http.
io = Care::IOWrapper.new(io) unless io.is_a?(Care::IOWrapper)
# Always instantiate parsers fresh for each input, since they might
# contain instance variables which otherwise would have to be reset
# between invocations, and would complicate threading situations
parsers = parsers_for(natures, formats)
parser_errors = {}
results = parsers.lazy.map do |parser|
# We need to rewind for each parser, anew
io.seek(0)
# Limit how many operations the parser can perform
limited_io = ReadLimiter.new(io, max_bytes: MAX_BYTES, max_reads: MAX_READS, max_seeks: MAX_SEEKS)
begin
parser.call(limited_io)
rescue => error
parser_errors[parser.class.name] = error
nil
end
end.reject(&:nil?)
return results.first if strategy == :first_result
{
results: results.to_a,
errors: parser_errors
}
end
Could you comment on it?
Any suggestions on what does registering an exception mean?
Good question. Something like this: https://github.com/WeTransfer/image_vise/blob/master/lib/image_vise/render_engine.rb#L83
In other words, we can have a situation where parsers A and B return a valid result, and parser C errors out. In that case, we need to have some way to communicate that parser C failed to the caller, and maybe use a global FormatParser-configured hook for reporting the error, say, to Appsignal. So by "registering" I mean "communicating".
Your idea seems nice, however I think we should return a more "monadic" flat list of parsers + their results or errors. Something like a sum type for Either<StandardError,Result> that you either have to unwrap explicitly or we bridge over methods and fail if you call a method that belongs on a result on an Exception object?..
@julik So each parser would return either a StandardError, a Result or simply nil.
Should this result be encapsulated in a Hash or some other data structure so that you know which specific parsers did raise those errors or returned a result? Maybe every element in this lazy enumerator should be a hash containing the result and the name of the parser that produced it.
{
result: safe_parse(parser, limited_io) # returns either StandardError, Result or nil
parser: parser.class.name
}
Should the first_result strategy return the first non StandardError or nil output? Also, here would it be interesting to remove the strategy from this method and return just the lazy enumerator and let the caller or some other wrapper methods we implement decide what to do with it?
@WJWH I know what you will say, but we might be needing a sum type or a monad here. Ideas?