vscode-jupyter icon indicating copy to clipboard operation
vscode-jupyter copied to clipboard

`.env` Parser Does Not Support Multiline Environment Variables

Open piskunow opened this issue 2 months ago • 9 comments

Type: Bug

Behaviour

Bug Description

The .env file parser in the Python extension does not support multiline environment variables, breaking compatibility with the standard python-dotenv library and causing corrupted/truncated values to be passed to Jupyter kernels.

Steps to reproduce:

1. Create a .env file with a multiline variable

Create a file named .env in your workspace root:

TEST_ENV_VAR_MULTILINE='{
  "key1": "value1",
  "key2": "value2"
}'
TEST_ENV_VAR_SIMPLE='simple_value'

2. Create a Jupyter notebook and check the environment

Create a new Jupyter notebook (.ipynb) and run this code in the first cell (before any load_dotenv() or imports):

import os
print("Simple var:", os.environ.get('TEST_ENV_VAR_SIMPLE'))
print("Multiline var:", repr(os.environ.get('TEST_ENV_VAR_MULTILINE')))

3. Observe the bug

Expected output:

Simple var: 'simple_value'
Multiline var: '{\n  "key1": "value1",\n  "key2": "value2"\n}'

Actual output:

Simple var: 'simple_value'
Multiline var: "'{"

The multiline variable is truncated to just the first line!

4. Compare with regular Python script

Run the same code in a regular Python script (not Jupyter):

# test.py
from dotenv import load_dotenv
import os

load_dotenv()
print("Multiline var:", repr(os.environ.get('TEST_ENV_VAR_MULTILINE')))
$ python test.py
Multiline var: '{\n  "key1": "value1",\n  "key2": "value2"\n}'  ✅ Works correctly!

Extension version: 2025.14.0 VS Code version: Code 1.104.1 (0f0d87fa9e96c856c5212fc86db137ac0d783365, 2025-09-17T23:36:24.973Z) OS version: Linux x64 6.8.0-79-generic Modes:

  • Python version (& distribution if applicable, e.g. Anaconda): 3.11.13
  • Type of virtual environment used (e.g. conda, venv, virtualenv, etc.): Venv
  • Value of the python.languageServer setting: Default
User Settings


venvPath: "<placeholder>"

languageServer: "Pylance"

Installed Extensions
Extension Name Extension Id Version
claude-code Ant 2.0.0
copilot Git 1.372.0
copilot-chat Git 0.31.3
js-debug ms- 1.104.0
js-debug-companion ms- 1.1.3
jupyter ms- 2025.8.0
jupyter-keymap ms- 1.1.2
jupyter-renderers ms- 1.3.0
material-theme zhu 3.19.0
pylint ms- 2025.2.0
python ms- 2025.14.0
rainbow-csv mec 3.22.0
remote-containers ms- 0.427.0
ruff cha 2025.26.0
vscode-js-profile-table ms- 1.0.10
vscode-jupyter-cell-tags ms- 0.1.9
vscode-jupyter-slideshow ms- 0.1.6
vscode-pylance ms- 2025.8.3
vscode-python-envs ms- 1.8.0
System Info
Item Value
CPUs Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz (4 x 2900)
GPU Status 2d_canvas: enabled
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: disabled_off
trees_in_viz: disabled_off
video_decode: enabled
video_encode: disabled_software
vulkan: disabled_off
webgl: enabled
webgl2: enabled
webgpu: disabled_off
webnn: disabled_off
Load (avg) 5, 4, 4
Memory (System) 31.23GB (15.46GB free)
Process Argv
Screen Reader no
VM 0%
DESKTOP_SESSION ubuntu-xorg
XDG_CURRENT_DESKTOP Unity
XDG_SESSION_DESKTOP ubuntu-xorg
XDG_SESSION_TYPE x11

piskunow avatar Sep 29 '25 23:09 piskunow

Root Cause

I traced this to the .env parser implementation in:

File: src/client/common/variables/environment.ts (lines 125-166)

The parser consists of two functions that work together:

1. parseEnvFile() - The Main Parser (Lines 125-139)

export function parseEnvFile(lines: string | Buffer, baseVars?: EnvironmentVariables): EnvironmentVariables {
    const globalVars = baseVars ? baseVars : {};
    const vars: EnvironmentVariables = {};
    lines
        .toString()
        .split('\n')  // ← THE BUG: Splits by newline FIRST
        .forEach((line, _idx) => {
            const [name, value] = parseEnvLine(line);
            if (name === '') {
                return;
            }
            vars[name] = substituteEnvVars(value, vars, globalVars);
        });
    return vars;
}

The problem: .split('\n') splits the entire file into lines before parsing, which destroys multiline quoted values. The file is split into separate lines before any quote-awareness happens.

2. parseEnvLine() - The Line Parser (Lines 141-166)

function parseEnvLine(line: string): [string, string] {
    // Most of the following is an adaptation of the dotenv code:
    //   https://github.com/motdotla/dotenv/blob/master/lib/main.js#L32
    // We don't use dotenv here because it loses ordering, which is
    // significant for substitution.
    const match = line.match(/^\s*([a-zA-Z]\w*)\s*=\s*(.*?)?\s*$/);
    if (!match) {
        return ['', ''];
    }

    const name = match[1];
    let value = match[2];
    if (value && value !== '') {
        if (value[0] === "'" && value[value.length - 1] === "'") {
            value = value.substring(1, value.length - 1);
            value = value.replace(/\\n/gm, '\n');
        } else if (value[0] === '"' && value[value.length - 1] === '"') {
            value = value.substring(1, value.length - 1);
            value = value.replace(/\\n/gm, '\n');
        }
    } else {
        value = '';
    }

    return [name, value];
}

Additional problems in parseEnvLine():

  1. Single-line regex: The pattern /^\s*([a-zA-Z]\w*)\s*=\s*(.*?)?\s*$/ only matches complete KEY=VALUE pairs within a single line
  2. No state tracking: Doesn't track whether we're inside a quoted string
  3. Only handles escaped newlines: replace(/\\n/gm, '\n') only converts \n strings to actual newlines, doesn't preserve actual newline characters in the source

Why the Comment About dotenv is Misleading

The code has this comment:

"We don't use dotenv here because it loses ordering, which is significant for substitution."

This reasoning is outdated for several reasons:

  1. Modern dotenv preserves order: Since ES2015, JavaScript objects maintain insertion order. The dotenv library (v16+) returns plain objects that preserve the order variables are defined in the file.

  2. Substitution can still work: The variable substitution logic (substituteEnvVars()) can be applied to the output of dotenv.parse() just as easily:

    const parsed = dotenv.parse(lines);
    for (const [name, value] of Object.entries(parsed)) {
        vars[name] = substituteEnvVars(value, vars, globalVars);
    }
    
  3. The custom parser broke a key feature: By reimplementing parsing from scratch, this code lost the multiline support that the original dotenv library provides correctly.

Current Data Flow

The bug manifests through this chain:

  1. File read: customEnvironmentVariablesProvider.node.ts reads .env
  2. Parse: Calls parseEnvFile() which splits by \n first
  3. Export to kernel: Parsed (corrupted) variables passed to kernelEnvVarsService.node.ts
  4. Kernel spawn: kernelProcess.node.ts spawns kernel with corrupted environment
  5. Result: Jupyter kernel inherits TEST_VAR="'{"instead of full JSON

What happens to the .env file:

Original content:

TEST_ENV_VAR_MULTILINE='{
  "key1": "value1",
  "key2": "value2"
}'

After .split('\n'):

[
  "TEST_ENV_VAR_MULTILINE='{",  // ← Only this line matches the regex
  '  "key1": "value1",',         // ← No '=' sign, skipped
  '  "key2": "value2"',          // ← No '=' sign, skipped
  "}'"                           // ← No '=' sign, skipped
]

Result: TEST_ENV_VAR_MULTILINE = "'{" (corrupted!)

Impact

This bug affects:

✅ Affected

  • Jupyter notebooks in VSCode - Kernel inherits corrupted environment variables
  • Pydantic Settings - Reads from os.environ which has corrupted values
  • Any library that reads from os.environ before loading .env files
  • Real-world use cases:
    • SSH/SSL private keys (multiline by nature)
    • JSON Web Tokens (JWT)
    • Certificates
    • Pretty-printed JSON configurations
    • SQL scripts

❌ Not Affected

  • Regular Python scripts - They use python-dotenv directly (works correctly)
  • Terminal/shell - Doesn't use VSCode's parser
  • Single-line environment variables - Work fine

Expected Behavior

The parser should handle multiline values the same way python-dotenv does:

Valid .env syntax per the dotenv standard:

# Multiline with actual newlines (should work)
MULTILINE='{
  "key": "value"
}'

# Multiline with escaped newlines (already works)
ESCAPED='{"key": "value",\n  "key2": "value2"}'

Both formats should be supported.

Proposed Solution

Replace the line-by-line parser with a state machine parser that:

  1. Reads characters sequentially
  2. Tracks quote state (inside single quote, double quote, or unquoted)
  3. Only treats newline as a line separator when outside quotes
  4. Preserves newlines within quoted values

Implementation Approach 1: State Machine Parser

export function parseEnvFile(lines: string | Buffer, baseVars?: EnvironmentVariables): EnvironmentVariables {
    const globalVars = baseVars ? baseVars : {};
    const vars: EnvironmentVariables = {};
    const content = lines.toString();

    let i = 0;

    while (i < content.length) {
        // Skip whitespace (but not newlines)
        while (i < content.length && content[i] !== '\n' && /\s/.test(content[i])) {
            i++;
        }

        // Skip empty lines and comments
        if (i >= content.length || content[i] === '\n') {
            i++;
            continue;
        }
        if (content[i] === '#') {
            while (i < content.length && content[i] !== '\n') i++;
            i++;
            continue;
        }

        // Parse variable name
        const nameStart = i;
        while (i < content.length && /[a-zA-Z0-9_]/.test(content[i])) {
            i++;
        }
        const name = content.substring(nameStart, i);

        if (!name) {
            while (i < content.length && content[i] !== '\n') i++;
            i++;
            continue;
        }

        // Skip whitespace and =
        while (i < content.length && /[\s=]/.test(content[i]) && content[i] !== '\n') {
            i++;
        }

        // Parse value
        let value = '';
        if (i < content.length && content[i] !== '\n') {
            const quote = content[i];

            if (quote === '"' || quote === "'") {
                // Quoted value - can span multiple lines
                i++; // Skip opening quote
                let escaped = false;

                while (i < content.length) {
                    const char = content[i];

                    if (escaped) {
                        // Handle escape sequences
                        switch (char) {
                            case 'n': value += '\n'; break;
                            case 'r': value += '\r'; break;
                            case 't': value += '\t'; break;
                            case '\\': value += '\\'; break;
                            case quote: value += quote; break;
                            default: value += '\\' + char; break;
                        }
                        escaped = false;
                    } else if (char === '\\') {
                        escaped = true;
                    } else if (char === quote) {
                        break; // Closing quote found
                    } else {
                        value += char; // Include newlines!
                    }
                    i++;
                }

                if (i < content.length && content[i] === quote) {
                    i++; // Skip closing quote
                }
            } else {
                // Unquoted value - single line only
                const valueStart = i;
                while (i < content.length && content[i] !== '\n' && content[i] !== '#') {
                    i++;
                }
                value = content.substring(valueStart, i).trim();
            }
        }

        // Skip to next line
        while (i < content.length && content[i] !== '\n') i++;
        if (i < content.length) i++;

        // Store the variable
        if (name) {
            vars[name] = substituteEnvVars(value, vars, globalVars);
        }
    }

    return vars;
}

Implementation Approach 2: Use Standard dotenv Library (Recommended)

I strongly recommend this approach - delegate to the well-tested dotenv npm package instead of maintaining custom parsing logic:

import * as dotenv from 'dotenv';

export function parseEnvFile(lines: string | Buffer, baseVars?: EnvironmentVariables): EnvironmentVariables {
    const globalVars = baseVars ? baseVars : {};
    const parsed = dotenv.parse(lines);
    const vars: EnvironmentVariables = {};

    // Apply variable substitution to the parsed values
    // This maintains the ordering needed for substitution
    for (const [name, value] of Object.entries(parsed)) {
        vars[name] = substituteEnvVars(value, vars, globalVars);
    }

    return vars;
}

Why this is the better solution:

  1. Battle-tested library: dotenv has 18+ million weekly downloads and has been refined over years

    • Handles multiline values correctly
    • Handles all quote types (single, double, backticks)
    • Handles escape sequences properly
    • Handles edge cases you haven't thought of
  2. Preserves ordering: Modern JavaScript (ES2015+) guarantees object insertion order, so variable substitution works perfectly:

    // Given .env:
    // BASE_URL=https://example.com
    // API_URL=${BASE_URL}/api
    
    const parsed = dotenv.parse(envContent);
    // parsed = { BASE_URL: 'https://example.com', API_URL: '${BASE_URL}/api' }
    
    // Substitution happens in order:
    for (const [name, value] of Object.entries(parsed)) {
        vars[name] = substituteEnvVars(value, vars, globalVars);
    }
    // Result: { BASE_URL: 'https://example.com', API_URL: 'https://example.com/api' }
    
  3. Matches python-dotenv behavior: Since Python developers expect .env files to work the same in Python scripts and Jupyter, using the canonical dotenv library ensures consistency

  4. Less maintenance burden:

    • ✅ No custom parser to maintain
    • ✅ Bug fixes handled by the community
    • ✅ New features (like inline comments) come for free
    • ✅ Security patches handled upstream
  5. Already a dependency: The dotenv package is likely already in the dependency tree for other features

  6. Minimal code change: Only ~10 lines changed, with exactly the same API

Package details:

  • NPM: https://www.npmjs.com/package/dotenv
  • GitHub: https://github.com/motdotla/dotenv
  • Weekly downloads: 18+ million
  • Size: ~20KB minified

Suggested Tests

Tests should be added to verify multiline support:

test('Parse multiline value with single quotes', () => {
    const content = `TEST_VAR='{
  "key1": "value1",
  "key2": "value2"
}'`;

    const vars = parseEnvFile(content);

    assert.strictEqual(vars.TEST_VAR, '{\n  "key1": "value1",\n  "key2": "value2"\n}');
    assert.doesNotThrow(() => JSON.parse(vars.TEST_VAR));
});

test('Parse SSH private key (real-world use case)', () => {
    const content = `SSH_KEY="-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA04up8hoqzS1+
-----END RSA PRIVATE KEY-----"`;

    const vars = parseEnvFile(content);

    assert.include(vars.SSH_KEY, '-----BEGIN RSA PRIVATE KEY-----');
    assert.include(vars.SSH_KEY, '\n');
});

test('Regression: Single-line values still work', () => {
    const content = `VAR1=value1
VAR2='value2'
VAR3="value3"`;

    const vars = parseEnvFile(content);

    assert.strictEqual(vars.VAR1, 'value1');
    assert.strictEqual(vars.VAR2, 'value2');
    assert.strictEqual(vars.VAR3, 'value3');
});

Additional Context

  • The same bug exists in vscode-jupyter: src/platform/common/variables/environment.node.ts
  • Both extensions share nearly identical parsing code
  • This is a silent failure - no error message, just corrupted data
  • Affects developers using Pydantic Settings, FastAPI, and other modern Python frameworks

Workarounds (Temporary)

Until this is fixed, users can:

  1. Use single-line JSON:

    JSON_VAR='{"key": "value", "key2": "value2"}'
    
  2. Use escaped newlines:

    JSON_VAR='{"key": "value",\n  "key2": "value2"}'
    
  3. Clear os.environ in code before using Pydantic:

    import os
    os.environ.pop('CORRUPTED_VAR', None)
    from dotenv import load_dotenv
    load_dotenv()
    
  4. Use a separate config file instead of embedding JSON in .env

References

  • dotenv specification: https://github.com/motdotla/dotenv
  • python-dotenv (correctly handles multiline): https://github.com/theskumar/python-dotenv
  • Related issue in vscode-jupyter: [link if you create one there]

Labels

bug, area-environments, area-jupyter, needs-investigation

Willingness to Contribute

I'm willing to submit a PR with the fix if the maintainers agree with the approach. Happy to use either the state machine parser or delegate to the dotenv npm package - whichever the team prefers!

piskunow avatar Sep 29 '25 23:09 piskunow

Thank you for the extensive work both discovering this bug and tracking down a fix! I will review your approach and let you know so you can move forward with the correct PR. thanks!

cc @DonJayamanne as it is notebooks related as well

eleanorjboyd avatar Oct 03 '25 17:10 eleanorjboyd

seeing now that this doesn't occur for regular python files, this might be more in Jupyter's domain. @amunger any ideas here as Don is out for a bit?

eleanorjboyd avatar Oct 03 '25 20:10 eleanorjboyd

Indeed, the .env file is parsed when launching a Jupyter kernel, not for Python scripts by default

piskunow avatar Oct 03 '25 20:10 piskunow

Do you suggest to open another issue in vscode-jupyter? The same function is defined: https://github.com/microsoft/vscode-jupyter/blob/952a5f7212890e079373736cfb37719b1fad9b80/src/platform/common/variables/environment.node.ts#L132

piskunow avatar Oct 03 '25 21:10 piskunow

I'm not familiar with the env file parsing, so Don should definitely be the one to look help with this.

amunger avatar Oct 06 '25 15:10 amunger

@DonJayamanne any ideas here?

eleanorjboyd avatar Oct 07 '25 20:10 eleanorjboyd

Making feature request as Python extension also has its own .env parser and effort was put into this to ensure it lines up with user expectations (features) when running Python code. Trying to add support for dotenv nmp package could break those and introduce other issues. But agreed, its worth looking into.

DonJayamanne avatar Oct 07 '25 22:10 DonJayamanne

Python extension also has its own .env parser

I came to say the same thing: Could the Jupyter extension leverage the settings + environment variable handling from the other extensions?

afeld avatar Nov 20 '25 05:11 afeld