pip icon indicating copy to clipboard operation
pip copied to clipboard

Drop Python 3.9 support

Open notatallshaw opened this issue 1 month ago • 2 comments

We've been dropping Python 3.x support for the last couple of releases on the YY.1 release, I propose we continue this process.

In particular dropping Python 3.9 will allow pip to finally switch from urllib3 v1 to v2, and truststore will be enabled by default for all users.

Like with the previous release I suggest we keep the ruff target-version to the dropped version of Python, at least for one release: https://github.com/pypa/pip/blob/25.3/pyproject.toml#L206. This will validate no breaking syntax for Python 3.9 in pip's internal code.

And I do wonder if we should have a separate (informal) policy for syntax breakage, compared to explicit support, given the tooling makes it quite easy.

notatallshaw avatar Nov 04 '25 15:11 notatallshaw

What's the download stats for Python 3.9? We've historically only droppes support of a Python version when downloads go below about 5%. I'm not against dropping 3.9 support - getting onto urllib3 v2 is a significant benefit - but I do think we should still be guided by download figures.

pfmoore avatar Nov 04 '25 16:11 pfmoore

I am not going to object to pushing back this to a later release, but I don't personally find downloads a particularly compelling statistic, it heavily biases users who have configured some kind of CI or CD pipeline to run constantly, with either misconfigured or no caching.

That said, I checked bigquery's public PyPI dataset, and came up with a query that does a 7 day rolling average, I don't know if I can share the result set with a public link so this is the query I did:

Query

WITH base AS (
  SELECT
    DATE(timestamp) AS download_date,
    -- Normalize python version: handle '3.10.11', 'cp310', 'py310', etc.
    CASE
      WHEN REGEXP_CONTAINS(details.python, r'^\d+\.\d+') THEN
        REGEXP_EXTRACT(details.python, r'^\d+\.\d+')
      WHEN REGEXP_CONTAINS(details.python, r'^[cp|py](\d{2})') THEN
        CONCAT('3.', SUBSTR(REGEXP_EXTRACT(details.python, r'^[cp|py](\d{2})'), 2))
      WHEN REGEXP_CONTAINS(details.python, r'^[cp|py](\d{3})') THEN
        CONCAT(SUBSTR(REGEXP_EXTRACT(details.python, r'^[cp|py](\d{3})'), 0, 1),
               '.',
               SUBSTR(REGEXP_EXTRACT(details.python, r'^[cp|py](\d{3})'), 2))
      ELSE
        'other'
    END AS py_version
  FROM
    `bigquery-public-data.pypi.file_downloads`
  WHERE
    file.project = 'pip'
    AND details.python IS NOT NULL
    AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 180 DAY)
),

agg AS (
  SELECT
    download_date,
    py_version,
    COUNT(*) AS downloads
  FROM base
  GROUP BY download_date, py_version
),

pivoted AS (
  SELECT
    download_date,
    COALESCE(SUM(CASE WHEN py_version = '2.7' THEN downloads END), 0) AS py27,
    COALESCE(SUM(CASE WHEN py_version = '3.0' THEN downloads END), 0) AS py30,
    COALESCE(SUM(CASE WHEN py_version = '3.1' THEN downloads END), 0) AS py31,
    COALESCE(SUM(CASE WHEN py_version = '3.2' THEN downloads END), 0) AS py32,
    COALESCE(SUM(CASE WHEN py_version = '3.3' THEN downloads END), 0) AS py33,
    COALESCE(SUM(CASE WHEN py_version = '3.4' THEN downloads END), 0) AS py34,
    COALESCE(SUM(CASE WHEN py_version = '3.5' THEN downloads END), 0) AS py35,
    COALESCE(SUM(CASE WHEN py_version = '3.6' THEN downloads END), 0) AS py36,
    COALESCE(SUM(CASE WHEN py_version = '3.7' THEN downloads END), 0) AS py37,
    COALESCE(SUM(CASE WHEN py_version = '3.8' THEN downloads END), 0) AS py38,
    COALESCE(SUM(CASE WHEN py_version = '3.9' THEN downloads END), 0) AS py39,
    COALESCE(SUM(CASE WHEN py_version = '3.10' THEN downloads END), 0) AS py310,
    COALESCE(SUM(CASE WHEN py_version = '3.11' THEN downloads END), 0) AS py311,
    COALESCE(SUM(CASE WHEN py_version = '3.12' THEN downloads END), 0) AS py312,
    COALESCE(SUM(CASE WHEN py_version = '3.13' THEN downloads END), 0) AS py313,
    COALESCE(SUM(CASE WHEN py_version = '3.14' THEN downloads END), 0) AS py314,
    COALESCE(SUM(CASE WHEN py_version = '3.15' THEN downloads END), 0) AS py315,
    COALESCE(SUM(CASE WHEN py_version = 'other' THEN downloads END), 0) AS other
  FROM agg
  GROUP BY download_date
),

percentages AS (
  SELECT
    download_date,
    SAFE_DIVIDE(py27, total) * 100 AS py27,
    SAFE_DIVIDE(py30, total) * 100 AS py30,
    SAFE_DIVIDE(py31, total) * 100 AS py31,
    SAFE_DIVIDE(py32, total) * 100 AS py32,
    SAFE_DIVIDE(py33, total) * 100 AS py33,
    SAFE_DIVIDE(py34, total) * 100 AS py34,
    SAFE_DIVIDE(py35, total) * 100 AS py35,
    SAFE_DIVIDE(py36, total) * 100 AS py36,
    SAFE_DIVIDE(py37, total) * 100 AS py37,
    SAFE_DIVIDE(py38, total) * 100 AS py38,
    SAFE_DIVIDE(py39, total) * 100 AS py39,
    SAFE_DIVIDE(py310, total) * 100 AS py310,
    SAFE_DIVIDE(py311, total) * 100 AS py311,
    SAFE_DIVIDE(py312, total) * 100 AS py312,
    SAFE_DIVIDE(py313, total) * 100 AS py313,
    SAFE_DIVIDE(py314, total) * 100 AS py314,
    SAFE_DIVIDE(py315, total) * 100 AS py315,
    SAFE_DIVIDE(other, total) * 100 AS other
  FROM (
    SELECT
      *,
      (py27 + py30 + py31 + py32 + py33 + py34 + py35 + py36 + py37 +
       py38 + py39 + py310 + py311 + py312 + py313 + py314 + py315 + other) AS total
    FROM pivoted
  )
),

smoothed AS (
  SELECT
    download_date,
    AVG(py27) OVER w AS py27,
    AVG(py30) OVER w AS py30,
    AVG(py31) OVER w AS py31,
    AVG(py32) OVER w AS py32,
    AVG(py33) OVER w AS py33,
    AVG(py34) OVER w AS py34,
    AVG(py35) OVER w AS py35,
    AVG(py36) OVER w AS py36,
    AVG(py37) OVER w AS py37,
    AVG(py38) OVER w AS py38,
    AVG(py39) OVER w AS py39,
    AVG(py310) OVER w AS py310,
    AVG(py311) OVER w AS py311,
    AVG(py312) OVER w AS py312,
    AVG(py313) OVER w AS py313,
    AVG(py314) OVER w AS py314,
    AVG(py315) OVER w AS py315,
    AVG(other) OVER w AS other
  FROM percentages
  WINDOW w AS (ORDER BY download_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
)
SELECT *
FROM smoothed
ORDER BY download_date;

If this query is accurate (and I'm not sure it is, there may be some nuance to this dataset I don't understand) it reveals some pretty interesting numbers, as of 2025-11-03 on a 7 day rolling average with anything over a 0.01% download share:

  • Python 2.7: 1.94%
  • Python 3.5: 0.04%
  • Python 3.6: 0.53%
  • Python 3.7: 7.05%
  • Python 3.8: 3.89%
  • Python 3.9: 9%
  • Python 3.10: 19.1%
  • Python 3.11: 30.04%
  • Python 3.12: 18.45%
  • Python 3.13: 7.56%
  • Python 3.14: 2.36%
  • Python 3.15: 0.01%

Python 3.9 is on a significant downward trend, but there are spikes in the data, that I would hypothesize as high volatility caused by a small number of users: Image

Python 3.7 is also on a downward trend but it is not clear to me if/when it will go below 5%: Image

notatallshaw avatar Nov 04 '25 17:11 notatallshaw