parseclj icon indicating copy to clipboard operation
parseclj copied to clipboard

Caching for Repeated Parsing

Open joshcho opened this issue 2 years ago • 4 comments

I'm working on a performance-critical project that requires repeated parsing with parseclj. I'm interested in knowing whether there's built-in support for caching parsed results. If not, would memoization be an effective way to add this feature? Additionally, are there any caveats or issues to be aware of when implementing caching in conjunction with parseclj? If there's any places I should look, let me know.

Thanks!

joshcho avatar Sep 02 '23 09:09 joshcho

That's kind of hard to answer in general. Unless you have a lot of identical inputs I don't see how caching will help you much.

It might also not be trivial to memoize since parseclj works on temp buffers. You'd have to convert the buffer contents to a string and use that as your caching key.

Parsing EDN is always somewhat slow, it's not a format designed for fast parsing. Parseclj is hand written to be reasonably fast, but it's also emacs lisp so... maybe look at transit or cbor.

On Sat, Sep 2, 2023, 11:56 Josh Cho @.***> wrote:

I'm working on a performance-critical project that requires repeated parsing with parseclj. I'm interested in knowing whether there's built-in support for caching parsed results. If not, would memoization be an effective way to add this feature? Additionally, are there any caveats or issues to be aware of when implementing caching in conjunction with parseclj? If there's any places I should look, let me know.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/clojure-emacs/parseclj/issues/40, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH3VF674ZJOE27Z3Q7IGTXYL7FDANCNFSM6AAAAAA4IPUW5M . You are receiving this because you are subscribed to this thread.Message ID: @.***>

plexus avatar Sep 02 '23 10:09 plexus

I am essentially continually parsing an editing buffer, so most of it is identical. I am wondering which part/function of the package would best have the caching/memoization. Naively I can probably do sth like only cache inputs of strings that are large enough.

joshcho avatar Sep 02 '23 10:09 joshcho

Parseclj is a parse reduce parser, rather that a recursive descent parser. The only place I can imagine you might be able to do something is at the reduce step, based on the top few elements of the stack, but it's not gonna be as easy as wrapping some function in "memoize", this is going to require deep understanding of what it's doing.

plexus avatar Sep 02 '23 13:09 plexus

I would see what you can do to avoid parsing the whole buffer, maybe rerun the parser only on the current top level form.

Alternatively look into the treesitter parser, that's more built for this kind of use case.

plexus avatar Sep 02 '23 13:09 plexus