bulk_extractor Update BEViewer to modern Java.

is it not a really old version for a JRE ?
would be great it goes to the proper url when complaining

even with JAVA6 installed that s... does not work

C:\Program Files (x86)\Bulk Extractor 1.5.0\64-bit>"C:\Program Files\Java\jdk1.6.0_45\bin\java" -jar BEViewer.jar Invalid or corrupt jarfile BEViewer.jar

Sep 28 '24 14:09 phil123456

You are welcome to submit a pull request that corrects this.

Sep 28 '24 17:09 simsong

Getting to know bulk_extractor and found a solution that works.

Take the BEviewer.jar from the 1.5 installer - C:\Program Files (x86)\Bulk Extractor 1.5.0\BEViewer.jar
Drop it in the C:\Users\%user_name%\Downloads\bulk_extractor-2.0.0-windows\win64 folder with bulk_extractor64.exe
java -jar BEViewer.jar will load the UI and allow for the IS/OS API interaction to work as expected

Tested with openjdk 24.0.2 2025-07-15

Aug 29 '25 21:08 robyeates

This is great, but we need a solution to compile it with a modern Java. There are syntax changes required. Would you be willing to do that? We would like to be able to build the jar file.

On Fri, Aug 29, 2025 at 5:55 PM robyeates @.***> wrote:

robyeates left a comment (simsong/bulk_extractor#476) https://github.com/simsong/bulk_extractor/issues/476#issuecomment-3238371420

Getting to know bulk_extractor and found a solution that works.

Take the BEviewer.jar from the 1.5 installer - C:\Program Files (x86)\Bulk Extractor 1.5.0\BEViewer.jar

Drop it in the C:\Users%user_name%\Downloads\bulk_extractor-2.0.0-windows\win64 folder with bulk_extractor64.exe

java -jar BEViewer.jar will load the UI and allow for the IS/OS API interaction to work as expected

Tested with openjdk 24.0.2 2025-07-15

— Reply to this email directly, view it on GitHub https://github.com/simsong/bulk_extractor/issues/476#issuecomment-3238371420, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLCZMOO42LIDQPTA7Z33QDD3PAVCNFSM6AAAAACFFVTTE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMZYGM3TCNBSGA . You are receiving this because you commented.Message ID: @.***>

Aug 29 '25 22:08 simsong

I'd be happy to look into it!

Aug 29 '25 23:08 robyeates

That would be great. I’ve haven’t programmed in Java since 1996. A lot of people would appreciate having this work again.

The alternative project is to rewrite the user interface in React.

On Fri, Aug 29, 2025 at 7:08 PM robyeates @.***> wrote:

robyeates left a comment (simsong/bulk_extractor#476) https://github.com/simsong/bulk_extractor/issues/476#issuecomment-3238635734

I'd be happy to look into it!

— Reply to this email directly, view it on GitHub https://github.com/simsong/bulk_extractor/issues/476#issuecomment-3238635734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLBJ5OJGOXPFZ7G5E3T3QDMPHAVCNFSM6AAAAACFFVTTE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMZYGYZTKNZTGQ . You are receiving this because you commented.Message ID: @.***>

Aug 29 '25 23:08 simsong

Hi @simsong,

I've been making progress with the UI - there is a screenshot here

There are some pressing concerns regarding the API interaction, which forum would you prefer to discuss these in?

Sep 12 '25 15:09 robyeates

Nice progress. What are the API problems? Please open tickets as you discover them.

Sep 12 '25 19:09 simsong

I’ve been looking at the current process_http flow in path_printer.cpp. Right now it’s essentially an HTTP-like parser over stdio.

This works, but it couples process management with communications and limits concurrency (one client request at a time, single stdio stream), the code comments e.g. "make sure stdin is now clear, this is not a guaranteed indicator of readiness of the bulk_extractor thread, because the bulk_extractor thread can issue multiple writes and flushes"

What I’d like to explore is moving to HTTP ↔ HTTP communication by replacing the stdio pipe with a small TCP server (either a minimal socket loop or Boost.Asio with a fixed thread pool). process_http itself doesn’t need to change — it would still parse an istream and write to an ostream, just that now those streams are tied to sockets instead of stdin/stdout it also passes all the Thread hand holding to the HTTPClient.

My thinking is that this would -

Decouple process lifecycle from transport - BEViewer (or any other client) can connect to a long-running bulk_extractor service instead of having to manage the process directly. Single BEViewer health_check function to keep BE up and accepting connections.
Resilience - if a client request disconnects/terminates, the process keeps running and can accept new requests.
Parallelism - with Boost.Asio and a fixed thread pool (say 10 threads), you can multiplex client requests instead of serializing everything through stdio.

I’d also propose introducing versioned endpoints to stabilize this interop contract

e.g.

POST /v1/runBEscan {scan_settings_v1}
POST /v2/runBEscan {scan_settings_v1}
POST /v3/runBEscan {scan_settings_v2}

That way compatibility is tied to API version rather than the exact code revision, which would make it easier to ship a standalone BEViewer build that talks to whatever bulk_extractor service is available.

I haven’t written much C++ in a while, so before I wade into implementation I wanted to get initial feedback from you -

Would you be open to PR's adding a small TCP/Asio server wrapper around process_http? And does a versioned API surface (instead of implicit behavior changes) make sense for the project’s direction?

Sep 12 '25 21:09 robyeates

Thank you so much for your review of this code and your thoughts on how to move forward. I concur with you that moving from the current single-threaded interface to a multi-threaded system with an embedded HTTP listener is a straightforward enhancement, and I also believe that bulk_extractor's current design migrates quite gracefully to a multi-threaded listener.

I would recommend against using an asynchronous listener using Boost.Asio for two reasons. 1) It would introduce a dependency on Boost, which I have intentionally avoided. 2) Asynch processing is complex, whereas a multi-threaded synchronous design works well with the current architecture.

I also do not think that you need to replace the current system, you are just going to add a new flag that runs the path printer in a tcp listener.

I don't think that you need a threadpool. You simply create a new thread for each new connection and drop that thread when the connection closes.

I think that a versioned API is fine.

Right now this project is pretty much in stasis. Some people are using it and that's great, but I am not making any significant improvements. The obvious future direction would be to rewrite the thing into Rust, which I'm not interested in doing but somebody else might. Another thing that desperately needs to be done is to get re2 running on Windows and make a new windows installer. But again, nobody is paying for this work, and I don't have the time to do it myself.

Sep 13 '25 16:09 simsong

Thank you for the thoughtful feedback.

I’m new to the forensics domain, but having used FTK and Autopsy I can clearly see the value of bulk_extractor’s “read data, ask questions later” approach. In my experience, tools like FTK and Autopsy can get stuck in the filesystem layer, e.g. a simple recursive file path can crash FTK and lock up Autopsy. BE’s approach avoids that trap, and I believe that breadth of analysis will always have strong value. I’d like to contribute to keeping that value accessible.

On the future direction: I completely agree that a Rust backend with a modern frontend (e.g. React) makes sense. By investing in a stable API contract now, the components are free to evolve independently over time.

I’ll go ahead and implement the suggested changes. I also have some experience with Bazel, so I can take a look at getting re2 running on Windows. The UI can matrix build fairly easily - I’ll need to investigate the other components.

Sep 14 '25 10:09 robyeates

If you make a file map of where all the files are on the disk (with fiwalk, for example), you can then go from the bulk_extractor feature location to the actual file that it was contained in. There are Python scripts to do this, but the functionality was never built into the program.

Sep 14 '25 11:09 simsong