Automatically detect language for pasted code
Check for existing issues
- [X] Completed
Describe the feature
Currently, when I paste copied code into a new empty document, it stays as Plain Text. In VS Code, it'll automatically detect and set the language. Would be nice if Zed supports this too.
If applicable, add mockups / screenshots to help present your vision of the feature
No response
Another use case is standard input content. Example: git diff | zed -n - would then be auto-detected as a diff/patch.
yes yes yes, it is MUST!
PLEASE!!!
VSCode uses guesslang to do this, but that repo hasn't been touched in 4 years. I wonder if there might be some low-hanging fruit here for a more modern / faster implementation with LLMs
I guess it shouldn't be too hard to integrate Magika? https://opensource.googleblog.com/2025/11/announcing-magika-10-now-faster-smarter.html
Oh wow that looks amazing! Would love for Zed to integrate that
Based on https://docs.rs/magika/latest/magika/ seems not that hard to use?
The main question would be how to integrate this in Zed better:
- one large concern is that
tokiois in the dependencies: there'sGlobalTokioGPUI global to help handle such cases in Zed, but the dependency footprint alone is a concern, we'd better havebuffer,editorand other "base" crates without these extra dependencies if possible, or, at least, disabled by default.
Finding the right place to add this dependency and where to call it from seems to be the hardest part of the PR.
Drafting (can be wrong), I would try to initialize and set the GPUI global in zed crate (have that global type available for both editor and zed crates) and use that global in the editor.
The crate is kept on zed.rs level only, so no real magika types go beyond zed level.
- all the
magika::Session::new()-related questions: is it worth keeping this as somecx.global, what is the footprint of this extra thing and how slow is it to cold start instead?
That will influence whether that global from the previous bullet will have that session running, or will be just a code that cold-starts it every time.
- since the code is pasted, we cannot use file paths to determine the language, yet we can react to
EditorEvent::Editedevent somewhere externally, or tomulti_buffer::Event::Editedinside the editor itself: then, we will know which buffer got the change and has no language yet, we can take the buffer's text and put it into that global to determine the language.
One gotcha here is that we do not need the entire text to make a decision, we should at least limit that with some "3 screens of text max" heuristics at least, so we do not send megabytes of text accidentally.
Then, all that's left is to use https://docs.rs/magika/latest/magika/enum.ContentType.html and https://github.com/zed-industries/zed/blob/0149de4b54c55df8ec9ebc6a1da5b43c68e407e9/crates/language/src/language_registry.rs#L114
to derive a language, and call https://github.com/zed-industries/zed/blob/0149de4b54c55df8ec9ebc6a1da5b43c68e407e9/crates/language_selector/src/language_selector.rs#L206-L208 to set it.
Apparently, we have to use some hardcoded bits to unite both language worlds, like https://github.com/zed-industries/zed/blob/0149de4b54c55df8ec9ebc6a1da5b43c68e407e9/crates/extensions_ui/src/extension_suggest.rs#L14-L24 does, which seems ok.
This all seems to belong to the code that is inside the global mentioned above.
Lastly, given that this uses an AI model under the hood, we should respect the users disable_ai setting and thus only have the feature here enabled whenever that is not set to true.
hi! been lurking for a while now but i'd definitely like to give this a shot, may I?