Harden dictionary cache: replace Function-evaluated JS with JSON (+ migration)
Harden dictionary cache reading (avoid Function constructor)
Summary
Replace the current approach that evaluates cache files using the Function constructor with a safer, simpler JSON-based format and reader. Provide a seamless migration path from the existing export default {...}; format to JSON, and add tests to cover read/write and migration scenarios.
Motivation
- Using
new Function(...)to evaluate a cache file is unnecessarily risky and harder to maintain. - A plain JSON cache improves safety, tooling compatibility, and clarity.
- Migration can be implemented without breaking users by auto-detecting and converting the old format upon first write.
Current behavior
- File:
packages/compiler/src/lib/lcp/cache.ts - The cache writer writes
export default { ... };and the reader strips that header and evaluates the remainder withnew Function(...). - This requires executing code to read the cache and is not ideal.
Proposed changes
-
Change cache file format to JSON
- Write cache as JSON to a
.jsonfile (for example,lingo-cache.json), or keep the current filename but ensure pure JSON content. - Read cache with
JSON.parse.
- Write cache as JSON to a
-
Backward-compatible migration
- Detect the legacy format that starts with
export default. - Parse legacy content by converting it to an object without executing code (e.g., using a minimal, safe transform and
JSON.parseif possible, or a strict parser), then immediately re-write the cache in the new JSON format. - Consider logging a one-time info message when migration occurs (guarded by a debug flag).
- Detect the legacy format that starts with
-
Tests
- Add tests that cover:
- Reading/writing the new JSON cache format.
- Migrating from an existing legacy
export default {...};cache file to the new JSON format. - Ensuring no data loss and deterministic formatting (stable key ordering already exists in writer code).
- Add tests that cover:
Affected files (initial)
packages/compiler/src/lib/lcp/cache.ts(read/write and format logic)- Potentially references to
LCP_DICTIONARY_FILE_NAMEif the extension changes (packages/compiler/src/_const) - New tests near
packages/compiler/src/lib/lcp/(or existing test suite location)
Can I start working on it? Maintainers [@vrcprl @maxprilutskiy @sumitsaurabh927 @davidturnbull ..]
ill work on this feature
TL;DR replace dictionary.js with dictionary.json. Am I understanding this right @manasdutta04?
I came across this issue and saw it was unassigned, so I decided to contribute. I've already submitted a PR #1518 to resolve this issue. Would you mind assigning it to me so it's clear that the issue is being actively worked on? I look forward to your feedback. Thanks!
@manasdutta04, the opener of the issue, clearly asked to work on the issue, so give him a chance to get assigned and open a PR before working on it @Roniscend.
Issues are assigned in this order:
- First priority is given to the person who opened the issue
- Second is given to the first commenter who wants to work (assuming the person who raised the issue doesn't want to work on it)
Submitting a PR (even if it solves/fixes the issue) doesn't count and WILL NOT get accepted if unwarranted.
Hope you understand!
Yes, that’s right @The-Best-Codes, the main goal is to replace the current JS-based cache (dictionary.js, which contains export default { ... };) with a plain JSON cache (dictionary.json).
Here’s how I plan to approach it:
- Format change
- Write the cache in pure JSON format instead of a JS module.
- Read it back using JSON.parse instead of evaluating code with new Function(...).
- Migration for backward compatibility
- Detect legacy files that start with export default.
- Safely parse the legacy content without executing it (e.g., strip the wrapper and parse the object).
- After successful read, immediately re-write it in the new JSON format.
- Optionally log a one-time info message under a debug flag.
- File naming
- Either switch to a .json extension (e.g., dictionary.json), or keep the same name for compatibility while ensuring the content is valid JSON.
- Testing
- Reading/writing new JSON cache files.
- Migrating from the legacy JS format.
- Ensuring no data loss and deterministic formatting (stable key order).
So, TL;DR. Yes @The-Best-Codes, replacing dictionary.js with a safer dictionary.json, plus migration and tests, is exactly the idea.
Hi @manasdutta04 , I appreciate the effort, but what problem specifically this is supposed to solve?
- Using new Function(...) to evaluate a cache file is unnecessarily risky and harder to maintain.
- A plain JSON cache improves safety, tooling compatibility, and clarity.
You're suggesting to introduce a fundamental change, that will affect all users of the compiler, but I don't see why this is a must.
Please explain?
If it's just a "nice to have" that doesn't address any specific issues, I don't want us to ship it.
Hi @maxprilutskiy, thanks for asking, happy to clarify!
Why this change is important
Security Risk:
Using new Function(...) to load cache files means that any modification or corruption—whether via a malicious actor or supply chain attack—could result in arbitrary code execution. Even “trusted” cache files can become a risk over time. Best practice in modern tooling is to treat cache/data files strictly as data, not executable code.
JSON is inherently safe because it cannot run code.
Tooling and Developer Experience: Plain JSON files are universally supported by IDEs, linters, and version control tools, making cache data much easier to inspect, debug, and verify. The legacy export-based format requires extra manual handling and prevents straightforward tooling integration.
Industry Standards: Most major tools—Webpack, ESLint, pnpm, Yarn, Vite use JSON or similar strictly data formats for their caches or config files. This reduces risk and boosts compatibility.
Performance and Reliability:
JSON.parse is faster and less error-prone than dynamic JS evaluation. The proposed migration ensures users don’t face breaking changes; old caches will be upgraded automatically.
This isn’t just a “nice to have”—it’s an upgrade that brings the cache format in line with industry best practices, eliminates a real security risk, and improves clarity and maintainability for every contributor and user.