lingo.dev icon indicating copy to clipboard operation
lingo.dev copied to clipboard

Harden dictionary cache: replace Function-evaluated JS with JSON (+ migration)

Open manasdutta04 opened this issue 1 month ago • 7 comments

Harden dictionary cache reading (avoid Function constructor)

Summary

Replace the current approach that evaluates cache files using the Function constructor with a safer, simpler JSON-based format and reader. Provide a seamless migration path from the existing export default {...}; format to JSON, and add tests to cover read/write and migration scenarios.

Motivation

  • Using new Function(...) to evaluate a cache file is unnecessarily risky and harder to maintain.
  • A plain JSON cache improves safety, tooling compatibility, and clarity.
  • Migration can be implemented without breaking users by auto-detecting and converting the old format upon first write.

Current behavior

  • File: packages/compiler/src/lib/lcp/cache.ts
  • The cache writer writes export default { ... }; and the reader strips that header and evaluates the remainder with new Function(...).
  • This requires executing code to read the cache and is not ideal.

Proposed changes

  1. Change cache file format to JSON

    • Write cache as JSON to a .json file (for example, lingo-cache.json), or keep the current filename but ensure pure JSON content.
    • Read cache with JSON.parse.
  2. Backward-compatible migration

    • Detect the legacy format that starts with export default.
    • Parse legacy content by converting it to an object without executing code (e.g., using a minimal, safe transform and JSON.parse if possible, or a strict parser), then immediately re-write the cache in the new JSON format.
    • Consider logging a one-time info message when migration occurs (guarded by a debug flag).
  3. Tests

    • Add tests that cover:
      • Reading/writing the new JSON cache format.
      • Migrating from an existing legacy export default {...}; cache file to the new JSON format.
      • Ensuring no data loss and deterministic formatting (stable key ordering already exists in writer code).

Affected files (initial)

  • packages/compiler/src/lib/lcp/cache.ts (read/write and format logic)
  • Potentially references to LCP_DICTIONARY_FILE_NAME if the extension changes (packages/compiler/src/_const)
  • New tests near packages/compiler/src/lib/lcp/ (or existing test suite location)

Can I start working on it? Maintainers [@vrcprl @maxprilutskiy @sumitsaurabh927 @davidturnbull ..]

manasdutta04 avatar Oct 29 '25 16:10 manasdutta04

ill work on this feature

sahilkapase avatar Oct 29 '25 16:10 sahilkapase

TL;DR replace dictionary.js with dictionary.json. Am I understanding this right @manasdutta04?

The-Best-Codes avatar Nov 02 '25 14:11 The-Best-Codes

I came across this issue and saw it was unassigned, so I decided to contribute. I've already submitted a PR #1518 to resolve this issue. Would you mind assigning it to me so it's clear that the issue is being actively worked on? I look forward to your feedback. Thanks!

Roniscend avatar Nov 02 '25 16:11 Roniscend

@manasdutta04, the opener of the issue, clearly asked to work on the issue, so give him a chance to get assigned and open a PR before working on it @Roniscend.

Issues are assigned in this order:

  • First priority is given to the person who opened the issue
  • Second is given to the first commenter who wants to work (assuming the person who raised the issue doesn't want to work on it)

Submitting a PR (even if it solves/fixes the issue) doesn't count and WILL NOT get accepted if unwarranted.

Hope you understand!

The-Best-Codes avatar Nov 02 '25 18:11 The-Best-Codes

Yes, that’s right @The-Best-Codes, the main goal is to replace the current JS-based cache (dictionary.js, which contains export default { ... };) with a plain JSON cache (dictionary.json).

Here’s how I plan to approach it:

  1. Format change
  • Write the cache in pure JSON format instead of a JS module.
  • Read it back using JSON.parse instead of evaluating code with new Function(...).
  1. Migration for backward compatibility
  • Detect legacy files that start with export default.
  • Safely parse the legacy content without executing it (e.g., strip the wrapper and parse the object).
  • After successful read, immediately re-write it in the new JSON format.
  • Optionally log a one-time info message under a debug flag.
  1. File naming
  • Either switch to a .json extension (e.g., dictionary.json), or keep the same name for compatibility while ensuring the content is valid JSON.
  1. Testing
  • Reading/writing new JSON cache files.
  • Migrating from the legacy JS format.
  • Ensuring no data loss and deterministic formatting (stable key order).

So, TL;DR. Yes @The-Best-Codes, replacing dictionary.js with a safer dictionary.json, plus migration and tests, is exactly the idea.

manasdutta04 avatar Nov 03 '25 15:11 manasdutta04

Hi @manasdutta04 , I appreciate the effort, but what problem specifically this is supposed to solve?

- Using new Function(...) to evaluate a cache file is unnecessarily risky and harder to maintain.
- A plain JSON cache improves safety, tooling compatibility, and clarity.

You're suggesting to introduce a fundamental change, that will affect all users of the compiler, but I don't see why this is a must.

Please explain?

If it's just a "nice to have" that doesn't address any specific issues, I don't want us to ship it.

maxprilutskiy avatar Nov 12 '25 07:11 maxprilutskiy

Hi @maxprilutskiy, thanks for asking, happy to clarify!

Why this change is important

Security Risk: Using new Function(...) to load cache files means that any modification or corruption—whether via a malicious actor or supply chain attack—could result in arbitrary code execution. Even “trusted” cache files can become a risk over time. Best practice in modern tooling is to treat cache/data files strictly as data, not executable code.
JSON is inherently safe because it cannot run code.

Tooling and Developer Experience: Plain JSON files are universally supported by IDEs, linters, and version control tools, making cache data much easier to inspect, debug, and verify. The legacy export-based format requires extra manual handling and prevents straightforward tooling integration.

Industry Standards: Most major tools—Webpack, ESLint, pnpm, Yarn, Vite use JSON or similar strictly data formats for their caches or config files. This reduces risk and boosts compatibility.

Performance and Reliability: JSON.parse is faster and less error-prone than dynamic JS evaluation. The proposed migration ensures users don’t face breaking changes; old caches will be upgraded automatically.

This isn’t just a “nice to have”—it’s an upgrade that brings the cache format in line with industry best practices, eliminates a real security risk, and improves clarity and maintainability for every contributor and user.

manasdutta04 avatar Nov 12 '25 13:11 manasdutta04