Erlang Binary functions
https://github.com/bartblast/hologram/issues/355
The scope of the issue was for :binary.split/2 and :binary.split/3
- This required creating
:binary.compile_pattern/1and implementing the existing matching algorithms, which was the bulk of the actual work to get the initial scope finished. - Because those matching algorithms were finished, it made adding
:binary.match/2,:binary.match/3,:binary.matches/2,:binary.matches/3,:binary.replace/3, and:binary.replace/4became relatively easy to implement as they also rely on those algoithms for the heavy lifting, and they are consequently included in this pull request.
Hey @atavistock!
Thanks for jumping on this so quickly - really appreciate how thorough the :binary work is. I do need to ask you to break this PR up so review stays manageable.
GitHub issues: I created one per function. Please comment anything on each so I can assign you officially:
- #379 – :binary.compile_pattern/1
- #380 – :binary.match/2 and :binary.match/3
- #381 – :binary.matches/2 and :binary.matches/3
- #382 – :binary.replace/3 and :binary.replace/4
If you think the shared matching algorithms need their own issue, feel free to create one - though ideally due to how Hologram's compiler works, it's better to inline the code with the functions themselves (see below).
Why split: I'm swamped with PRs. Separate PRs let me review faster, unblock functions independently, and make it clear to others which functions are already claimed (per the initiative flow).
Code organization reminder: The compiler only sees code that's reachable via Deps definitions in the call graph (see the Contributing Guide). So instead of extracting validation or shared logic into private helpers:
- Prefer having the smaller-arity function delegate to the larger one with defaults.
- If you truly need shared utilities, either add them to the runtime (so every app ships with it) or wrap them in the Start/End extraction markers and register them in the call graph definitions so the compiler pulls them in. Until then, keep validations inline - we'll DRY things up later.
Quick note: The CI formatting check found some issues. You can fix them by running $ mix f from the root directory.
Thanks again - your contributions are awesome, we just have to keep the process tight so review doesn't bottleneck. Once the issues are claimed and PRs split, I can move fast on reviews :)
I've just added "Quality Checks Before Submitting" section here: https://hologram.page/reference/contributing, though to get the updated checks you'll need to sync with the dev branch.
I'm happy to split these apart into separate incremental PRs. I can certainly understand how it will be much easier to digest whats happening and the changes being made.
I would like to understand this a little better:
If you think the shared matching algorithms need their own issue, feel free to create one - though ideally due to how Hologram's compiler works, it's better to inline the code with the functions themselves (see below).
Erlang uses two fairly complex algorithms done in a C .nif for the pattern handling which are decided based on the input. Each of those algorithms does searching, matching, and replacing differently, but they also share a lot of the same boiler plate. Implementing both of them in javascript takes 600 or 900 lines of javascript depending on if we're using an abstract class. Inlining the code into the mock Erlang calls would make for a massive amount of duplication in each of the binary operations. I don't know the implications for Hologram, but I think that amount of duplication would be a significant problem for maintenance and complexity.
If we eschew the simpler algorithm completely, it could make inlining search, match, and replace much more clear, but they'd still share a lot of duplicated code for the alogirthm implementation itself.
Curious how you feel we should go here?
Hi Aaron, I get it - the shared code is complex and avoiding duplication makes total sense.
I think the best approach is to implement them as "virtual" Erlang functions with underscored names (e.g., :binary._my_fun/1). This way they:
- Stay in
binary.mjswithStart/Endmarkers - Can be declared in
Depsand tracked in the call graph - Can have isolated tests
- Get extracted by the compiler only when needed
- Won't be confused with actual Erlang API functions (underscore prefix convention)
You could even model class instance-like structures this way if needed.
What do you think about this approach? Also, can the matchers be split into 2-3 separate issues (beyond the actual :binary function PRs) to make review more manageable, or do they need to go together in one PR?
Okay, so I've actually been struggling with one use case of
:binary.compile_pattern and I'm not sure that it's solvable.
In erlang :binary.compile_pattern can accept a single binary, a list of
binaries, or a pre-compiled pattern. In javascript land we can accept a
binary or a list of binaries, and recreate an implementation which works
within javascript. We can even work with a pre-compiled pattern that was
created in javascript land.
Unfortunately, on the erlang side if you pass a binary or a list of
binaries in erlang you get back a tuple which looks like {:bm, #Reference<0.920665636.557187080.162275>}. This is fine internally in
erlang, but the reference is completely opaque and not reproducible outside
the c coded nif. There's a specially optimized format which has some
specific magic in that reference, and there's just no way I can find to
extract the original pattern or recreate the pattern in javascript.
I've made test "returns a tuple for a compiled pattern" in the file
test/elixir/hologram/ex_js_consistency/erlang/binary_test.exs to check
for the error case until there's a clear direction.
On Fri, Nov 21, 2025 at 10:19 AM Bart Blast @.***> wrote:
bartblast left a comment (bartblast/hologram#374) https://github.com/bartblast/hologram/pull/374#issuecomment-3564147624
Hi Aaron, I get it - the shared code is complex and avoiding duplication makes total sense.
I think the best approach is to implement them as "virtual" Erlang functions with underscored names (e.g., :binary._my_fun/1). This way they:
- Stay in binary.mjs with Start/End markers
- Can be declared in Deps and tracked in the call graph
- Can have isolated tests
- Get extracted by the compiler only when needed
- Won't be confused with actual Erlang API functions (underscore prefix convention)
You could even model class instance-like structures this way if needed.
What do you think about this approach? Also, can the matchers be split into 2-3 separate issues (beyond the actual :binary function PRs) to make review more manageable, or do they need to go together in one PR?
— Reply to this email directly, view it on GitHub https://github.com/bartblast/hologram/pull/374#issuecomment-3564147624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFOBOGQF3IZQ2KN4BBHBL355JSZAVCNFSM6AAAAACMQRD7P6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNRUGE2DONRSGQ . You are receiving this because you were mentioned.Message ID: @.***>
Thanks, I proposed a solution here: https://github.com/bartblast/hologram/pull/418#discussion_r2565135144
Closing this PR as the features discussed here have been implemented in the following separate PRs:
- #418
- #616
- #617
- #628
- #629
Thanks to @atavistock and @mward-sudo for their work on these features!