ruff icon indicating copy to clipboard operation
ruff copied to clipboard

Expose Ruff's public API as a Python library

Open charliermarsh opened this issue 3 years ago • 31 comments
trafficstars

See: #593

charliermarsh avatar Nov 08 '22 14:11 charliermarsh

This would mean that we could do something like the following, right?

import ruff
errors = ruff.check_files(list_of_paths)
...

Thanks!

facundobatista avatar Nov 25 '22 11:11 facundobatista

Yup, that's right!

charliermarsh avatar Nov 25 '22 14:11 charliermarsh

@messense - I wasn't certain on this last time -- if we bundle a Python API with Ruff, will we need to build separate wheels for every Python version?

charliermarsh avatar Nov 25 '22 14:11 charliermarsh

If you can use abi3 features, one wheel per platform, otherwise you need to build separate wheels for every Python version.

messense avatar Nov 25 '22 14:11 messense

Awesome thank you. I think we should be able to do that, so maybe this will be really straightforward.

charliermarsh avatar Nov 25 '22 14:11 charliermarsh

Hello there, do you happen to have a rough timeline for when (if?) this is going to happen? I'm looking to integrate ruff into a tool I'm developing, which would require an API of some sort. It would be very helpful to know if this is something I can wait on, or look for another solution / workaround!

provinzkraut avatar Dec 08 '22 17:12 provinzkraut

@provinzkraut - It's definitely going to happen! I could probably ship it within the next week or so. I'd just been punting on it until I had more people asking for it.

Could I hear a bit more about your use-case, if you don't mind sharing?

charliermarsh avatar Dec 08 '22 18:12 charliermarsh

@charliermarsh That's good to hear!

Could I hear a bit more about your use-case, if you don't mind sharing?

Sure. I'm working on a markdown extension to automatically generate pymdown tabs for different Python versions from a source version, i.e. generate 3.7, 3.8, 3.10 tabs from a 3.7 source (repo).

Currently I'm using pyupgrade to generate the versions and autoflake to clean imports that have become superfluous. Especially autoflake is quite slow, making up a majority of the extensions runtime. Since ruff is way faster at this, I'd like to use it (also one less dependency). I fiddled around with using the CLI version, but that's messy and a performance degradation.

provinzkraut avatar Dec 08 '22 19:12 provinzkraut

@provinzkraut - Ok, cool. Let me see what I can do. I don't know if you're comfortable reading Rust, but would the current Rust public API suit your use-case, were it callable from Python with Python objects etc.?

charliermarsh avatar Dec 09 '22 04:12 charliermarsh

In short: it takes a file path (to find the pyproject.toml), the raw Python source code, and an autofix setting, and returns a list of checks (which themselves include the raw fixes / patches).

I'm guessing that for your use-case, what you actually want is a function that takes source code (plus settings, to enable a list of checks) and returns fixed source code?

charliermarsh avatar Dec 09 '22 04:12 charliermarsh

but would the current Rust public API suit your use-case, were it callable from Python with Python objects etc.?

I looked at this yesterday because I though that maybe it could be as simple as adding a tiny wrapper around the rust lib myself, but it seems to be a bit more involved. The current API doesn't really lend itself that well to my usecase.

I'm guessing that for your use-case, what you actually want is a function that takes source code (plus settings, to enable a list of checks) and returns fixed source code?

That would be ideal, yes. Dealing with a list of checks and extracting what I need from it also wouldn't be that big of an issue, but passing in configuration directly and omitting the config file is crucial, both for the needed configurability (I need to run the fixers with varying configuration for every invocation) and performance (I'm running the fixers many times on small snippets, which means the overhead of looking for and parsing a pyproject.toml every time adds up).

provinzkraut avatar Dec 09 '22 09:12 provinzkraut

I'm working on this now.

squiddy avatar Dec 27 '22 14:12 squiddy

I had a need to execute Ruff as an Alembic post write hook. I came up with a very hamfisted approach that I found from the distributed __main__.py

image

alembic.ini:

[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts.  See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
hooks = ruff, black

ruff.type = ruff

black.type = console_scripts
black.entrypoint = black

and then Alembic's env.py:

import os
import sysconfig
from alembic.script import write_hooks

@write_hooks.register("ruff")
def run_ruff(filename, options):
    ruff = os.path.join(sysconfig.get_path("scripts"), "ruff")
    os.spawnv(os.P_WAIT, ruff, [ruff, filename, "--fix", "--exit-zero"])

phillipuniverse avatar Jan 17 '23 20:01 phillipuniverse

👍 Yup that should be safe to do! (The downside being that you have to go through the CLI rather than calling a function directly. Hoping to enable that soon but not working on it right now.)

charliermarsh avatar Jan 17 '23 20:01 charliermarsh

Is the plan here to make a Python library that links to ruff directly? I want something that I can use in an interactive Python REPL to check for errors as the user types stuff, and shelling out to a subprocess on each character typed doesn't sound like a good idea (especially if I also have to write out the code to a tempfile or heredoc).

If you're curious, here's what I'm currently using with pyflakes https://github.com/asmeurer/mypython/blob/a836d0956a6443f7a85a032dc625ff3da1479a91/mypython/processors.py#L196. The code is complicated in part because pyflakes doesn't handle syntax errors very well, so I have to parse them separately. I haven't checked if ruff handles them better. There's lot of opportunities to improve over pyflakes' barebones Python API.

The main thing I would want from a Ruff Python API is a function that takes a string of Python code and returns a list of errors with line number, start and stop column numbers (where relevant), and the error message. Being able to get corresponding fixes would be nice too, I guess. The best API I can think of for a "fix" would be to return the whole block of code with the specific warning fixed, along with a new line and column number corresponding to the line and column of the original warning (so that I can interactively keep the cursor in the "same" location).

I'm happy to discuss API ideas more in depth or test out any prototypes if you're interested.

asmeurer avatar Mar 16 '23 22:03 asmeurer

That sounds cool!

If you're curious, here's what I'm currently using with pyflakes asmeurer/mypython@a836d09/mypython/processors.py#L196. The code is complicated in part because pyflakes doesn't handle syntax errors very well, so I have to parse them separately. I haven't checked if ruff handles them better. There's lot of opportunities to improve over pyflakes' barebones Python API.

Ruff creates a diagnostic for files with syntax errors. Adopting a more error-resilient parser is something that we consider doing.

The main thing I would want from a Ruff Python API is a function that takes a string of Python code and returns a list of errors with line number, start and stop column numbers (where relevant), and the error message.

That sounds reasonable, but we aren't there yet (your best shot is to call into the CLI). One of the biggest problems of exposing a linter API right now is that Ruff writes one-off warnings to stdout and relies on the global state to track whether to write the warning. Cleaning this up probably requires a larger refactoring around the diagnostic system... so that may take a while.

MichaReiser avatar Mar 17 '23 07:03 MichaReiser

For what it's worth, we power ruff-lsp and the VS Code extension over subprocess, and the CLI actually supports enough behavior to power the operations needed there. For example, you can use --format json to get a structured list of violations and their fixes. Similarly, if you pass input via stdin, and run with --fix, we print the "fixed" output to stdout.

charliermarsh avatar Mar 17 '23 22:03 charliermarsh

For what it's worth, we power ruff-lsp and the VS Code extension over subprocess, and the CLI actually supports enough behavior to power the operations needed there

How do you feel about adding a Python module that wraps this up in a convenient API?

I've been using the solution you suggested in a few of my tools now, and not having to implement that boilerplate every time would certainly be nice.

If that sounds good to you, I'd be happy to contribute.

provinzkraut avatar Jun 11 '23 08:06 provinzkraut

I'm keen to replace autoflake+isort with ruff in my shed all-in-one autoformatter - the subprocess trick works pretty well, except that if there's any way to change the isort settings in ruff without a config file I can't see it - and running isolated from any config is pretty important in this use-case. Any suggestions, or do I just need to wait for the library interface in this issue?

Zac-HD avatar Oct 15 '23 20:10 Zac-HD

Adding a data point: in mkdocstrings-python we format function signatures with Black if it is installed. We would like to support Ruff to, but spawning a subprocess for each signature is very costly, so we would greatly appreciate a Python binding that doesn't use subprocesses :slightly_smiling_face: A wrapper that hides the subprocess calls sounds nice, but won't be enough for our use-case.

pawamoy avatar Feb 22 '24 13:02 pawamoy

@pawamoy that sounds neat. We plan to integrate our LSP into ruff (implemented in Rust). I know, it's not as convenient as a Python API but it would allow you to format files without spawning a process for every signature (although it might still be very costly because it requires multiple LSP calls to format a single code snipped)

MichaReiser avatar Feb 22 '24 13:02 MichaReiser

By calls do you mean network calls? Or could we somehow spawn the LSP server locally (like a daemon)?

pawamoy avatar Feb 22 '24 14:02 pawamoy

You would spawn the LSP like a daemon and communicate over stdin/stdout.

MichaReiser avatar Feb 22 '24 14:02 MichaReiser

Ah, interesting. Then yeah, that's already much better than subprocesses :slightly_smiling_face: Thanks for the info!

pawamoy avatar Feb 22 '24 14:02 pawamoy

Adding a data point: in mkdocstrings-python we format function signatures with Black if it is installed. We would like to support Ruff to, but spawning a subprocess for each signature is very costly, so we would greatly appreciate a Python binding that doesn't use subprocesses :slightly_smiling_face: A wrapper that hides the subprocess calls sounds nice, but won't be enough for our use-case.

I put together an experimental package that uses PyO3 to wrap the Ruff formatter in a Python API that doesn't require any subprocesses. I'd still consider it alpha at best (there's only one callable function), but maybe it could be helpful to others as well?

https://github.com/amyreese/ruff-api

amyreese avatar Feb 22 '24 17:02 amyreese

Amazing, thanks for sharing! I'll check it out :)

pawamoy avatar Feb 22 '24 19:02 pawamoy

@charliermarsh just checking in - is there any way to configure the isort settings in --isolated mode, or do I just have to wait? No worries if so, I'm just looking forward to replacing black too...

Zac-HD avatar Feb 22 '24 20:02 Zac-HD

@charliermarsh just checking in - is there any way to configure the isort settings in --isolated mode, or do I just have to wait? No worries if so, I'm just looking forward to replacing black too...

@Zac-HD, yes, there is! We recently extended the --config flag so that arbitrary configuration options can be overridden via the command line using "inline TOML": https://docs.astral.sh/ruff/configuration/#the-config-cli-flag. So to override the isort extra-standard-library setting in --isolated mode (for example), you'd do something like ruff check path/to/file.py --config "lint.isort.extra-standard-library = ['path']".

AlexWaygood avatar Feb 22 '24 21:02 AlexWaygood

Adding another data point: In edvart, we are currently using isort to sort imports in Python code which is being dynamically. With a Python API, we could fully switch to ruff. For now, we are using ruff to format the source code, but keeping isort to format the generated code.

mbelak-dtml avatar Mar 06 '24 15:03 mbelak-dtml

Another data point: it would make it easier to replace programmatic calls to black, like in mdsformat-black: https://github.com/hukkin/mdformat-black/blob/master/mdformat_black/init.py


def format_python(unformatted: str, _info_str: str) -> str:
    return black.format_str(unformatted, mode=black.Mode())

jankatins avatar Jun 04 '24 11:06 jankatins