warehouse
warehouse copied to clipboard
Odd rendering of author when using PEP 621 metadata.
Describe the bug
PEP 621 allows project metadata to be defined in pyproject.toml. This uses a list of dictionaries to represent the project's authors. Each dictionary contains two keys, "name" and "email".
To map these fields to core metadata, PEP 621 says:
- If only name is provided, the value goes in Author.
- If only email is provided, the value goes in Author-email.
- If both email and name are provided, the value goes in Author-email, with the format {name} <{email}> (with appropriate quoting, e.g. using email.headerregistry.Address).
Based on that, in my build backend whey I am generating metadata that looks like:
Metadata-Version: 2.1
Name: tox-envlist
Version: 0.3.0
Summary: Allows selection of a different tox envlist.
Author-email: Dominic Davis-Foster <[email protected]>
However, on PyPI this renders in the sidebar as:

(this example from https://pypi.org/project/tox-envlist/)
It's also wrong in the JSON API:
{
"info": {
"author": "",
"author_email": "Dominic Davis-Foster <[email protected]>"
}
}
This causes further issues with tools using the API, such as https://pypistats.org, which leaves the author field blank:

Expected behavior
Compare this with another project created using setuptools:

where the metadata is:
Metadata-Version: 2.1
Name: domdf-python-tools
Version: 2.9.0
Summary: Helpful functions for Pythonβπβπ οΈ
Home-page: https://github.com/domdfcoding/domdf_python_tools
Author: Dominic Davis-Foster
Author-email: [email protected]
and the response from the JSON API:
{
"info":{
"author":"Dominic Davis-Foster",
"author_email":"[email protected]"
}
}
(this example from https://pypi.org/project/domdf-python-tools)
I would have expected warehouse to parse the Author-email field into the name and email address, and treat them the same as if they has been defined separately in Author and Author-email.
To Reproduce Visible at https://pypi.org/project/tox-envlist/
See also https://pypi.org/project/flit/3.2.0/, which uses PEP 621 metadata and has the same problem but uses a different build backend.
My Platform
N/A
Additional context
I could be wrong, but I wonder if this would be on PyPI or on the tool you used for uploading (twine, or poetry, or...). I'm trying to investigate, but I can't say for sure as of now.
Hm, the definition of the metadata value seems in PEP-0345 et. al. seems to indicate that this should be supported. I can't find the PEP that defines the upload format but I think you're right.
I've tried looking at what it would mean on the code side, I should have known, really, but the author/author-email situation is a mess and the hole thing is probably a can of worms :D
I can make it so that the base case of PEP 621 is handled, but there's quite a few examples for which I have no idea what should be returned.
Author = A B, Author-Email = C D <[email protected]>
Author = A B <[email protected]>, C D <[email protected]>, No Author-Email
Author = A B <[email protected]>, Author-Email = E F <[email protected]>
This is what we do today:
{% if release.author_email %}
<p><strong>{% trans %}Author:{% endtrans %}</strong> <a href="mailto:{{ release.author_email }}">{{ release.author or release.author_email }}</a></p>
{% elif release.author %}
<p><strong>{% trans %}Author:{% endtrans %}</strong> {{ release.author }}</p>
{% endif %}
This is what the PEP says:
Author (optional): A string containing the author's name at a minimum; additional contact information may be provided.
Example:
Author: C. Schultz, Universal Features Syndicate,
Los Angeles, CA <[email protected]>
Author-email (optional): A string containing the author's e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.
Example:
Author-email: "C. Schultz" <[email protected]>
It's hinted here and there that multiple comma-separated authors are fine in the Author field.
This really makes me want to no try and parse anything smarter than the bare bare minimum :D
I think the PEP is wrong. If both Author and Author-Email are provided, it's much simpler to just keep them as two fields, otherwise our existing logic needs to become a lot more complex:
https://github.com/pypa/warehouse/blob/7fc3ce5bd7ecc93ef54c1652787fb5e7757fe6f2/warehouse/templates/includes/packaging/project-data.html#L78-L82
@brettcannon I think you might have written this? Any thoughts here?
Rereading the whole thing I guess we could do the following:
- If only Author-Email is defined, parse as
RFC-822 From: header- If it works, we have our name and address
- If it fails, display as-is
- If only Author is defined, display as-is. Even if there might be email addresses in there, too bad
- If both are defined:
- If Author-Email parses as
RFC-822 From: header, concatenate the Author field and the "name" part of the Author-Email field - If it doesn't parse, display as 2 distinct field
- If Author-Email parses as
Would that work ?
It would probably work, but why should we be mangling two separate fields into one just to have to un-mangle it somewhere else? I don't see any advantage to it, and think it would be simpler to just change the PEP and the few tools (single tool?) that have already implemented it instead.
Ah, but then we'll never be able to have proper mailto: links ? I think I haven't understand what you'd want to do.
I'm not sure I follow, we have proper mailto: links now for non-PEP 521 metadata.
@brettcannon I think you might have written this? Any thoughts here?
If you mean what's in the metadata spec, that's how it's always been, i.e. I didn't do it π . PEP 621 just went with what was there and purposefully didn't touch the metadata spec (I tried to clean it up and got push-back from trying to do too much).
As for why PEP 621 uses Author-Email to its fullest extent based on the spec definition, I believe it was to avoid having to try and correlate Author and Author-Email when they were comma-separate fields since the data is inherently tied together.
I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email. It says RFC-822 From: header, so I believe only a single email address should be sent. In Author, though, multiple values can be sent, it's free-form.
I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email.
It does. From https://packaging.python.org/specifications/core-metadata/#author-email:
A string containing the authorβs e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.
Example:
Author-email: "C. Schultz" <[email protected]>Per RFC-822, this field may contain multiple comma-separated e-mail addresses:
Author-email: [email protected], [email protected]
So my reading of "a string containing an author's emails address ... can contain a name and e-mail address" combined with "this field may contain multiple comma-separate e-mail addresses" is what led me to do what I did for PEP 621.
To be clear, I personally don't care if a change is made in regards to this; I'm not trying to specifically defend how PEP 621 does things as how things should continue to be done; I'm just trying to explain the logic of how it ended up the way it did. But it seems any change will require an update to the metadata spec and PEP 621 if you want to restrict what's valid for the author- and maintainer-related metadata fields.
That was the missing piece of the puzzle to me. I was looking at the PEP text, where I should have been looking at the packaging doc. The part on multiple email addresses was added by @di 3 years ago following an update of Warehouse where corresponding processing was added.
There is already one moment during release submission where we have assigned a variable containing the "name" part of the multi-email RFC-822 encoded string. So without a lot of additional complexity, just assigning this to the "Author" field of the release in case it's not already filled would probably be enough.
Any update on this?
hi all - just a note that i'm having this issue too with test pipy for my package stravalib and i also see the same issue with sourmash on pypi. i don't think its the build back end in this case.
my META from my wheel thanks to @pradyunsg for telling me how to check this is:
Maintainer: Jonatan Smoocha, Yihong
Maintainer-email: Leah Wasser <[email protected]>, Hans Lellelid <[email protected]>
and on pypi i see

it seems like it's being parsed incorrect by pypi ?? many thanks for your work on pypi btw!
FWIW, the TOML in pyproject.toml relevant to the above was (built with setuptools):
maintainers = [
{name = "Leah Wasser", email = "[email protected]"},
{name = "Hans Lellelid", email = "[email protected]"},
{name = "Jonatan Smoocha"},
{name = "Yihong"},
]
x-ref https://github.com/stravalib/stravalib/pull/304
oh yes - i'll reference this issue in my pr as well. for now i've removed emails.
So do we think the conversion from TOML -> metadata wrong, or is PyPI's interpretation of the metadata wrong? What were you expecting to happen here?
From https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers:
Using the data to fill in core metadata is as follows:
If only name is provided, the value goes in Author or Maintainer as appropriate.
If only email is provided, the value goes in Author-email or Maintainer-email as appropriate.
If both email and name are provided, the value goes in Author-email or Maintainer-email as appropriate, with the format {name} <{email}>.
Multiple values should be separated by commas.
I think it's on PyPI's end -- wherein it's only presenting the Maintainer key with Maintainer-Email as the link, even if the latter contains names and doesn't match the Maintainer key.
I think the pyproject.toml's author/maintainer -> METADATA mapping (as it stands) operates on the assumption that both the "{type}" and "{type}-email" would be used/presented; whereas PyPI tries to present only one entry (Author / Maintainer) and tries to use the "{type}-email" as a link for "{type}" if they're both present.
What were you expecting to happen here?
That's an excellent question -- I'd like to ask @lwasser to provide her thoughts on this. How would you have expect PyPI to present the information you added to pyproject.toml? :)
maintainers = [
{name = "Leah Wasser", email = "[email protected]"},
{name = "Hans Lellelid", email = "[email protected]"},
{name = "Jonatan Smoocha"},
{name = "Yihong"},
]
One approach that I can think of is to not provide a single link to write an email to all authors/maintainers, and to instead split the keys on , and present them names individually (with those that have emails being linked to, on a per-person basis). For backwards-compat, we could keep the current linking behaviour (of Author w/ Author-Email as mailto:) if there's a single email with no name and a single name.
From https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers:
Given that maintainers rarely follow that guidance π, I think we still need to maintain some backwards compatibility with the expectation that Author/Maintainer is a string, Author-Email and Maintainer-Email is an email, and together they become a link.
Hence the suggestion of keeping the current behaviour when there's only one email + one name. π
absolutely @di @pradyunsg My understanding of how this works is that (id expect authors to operate the same!)
in my table here:
maintainers = [
{name = "Name One", email = "[email protected]"},
{name = "Name Two", email = "[email protected]"},
{name = "Name Three"},
{name = "Name Four"},
]
i'm specifying 4 maintainers. Thus on pypi, it would render as follows
<a href="mailto:[email protected]">Name One</a>, <a href="mailto:[email protected]">Name Two</a>, Name Three, Name Four
But instead it seems to do this:
<a href="mailto:name one <[email protected]>, name one <[email protected]>">Name Three</a>, <a href="name one <[email protected]>, name one <[email protected]>">Name Four</a>
I guess i would expect it to
- first list the maintainers in the order that they appear in the pyproject.toml and
- add the email link just to the items with an email?
Hence the suggestion of keeping the current behaviour when there's only one email + one name. π
Sorry, missed this in the edit I think. So what should happen with:
Author: Google, Inc.
Author-email: [email protected]
I don't think that suggestion maps well onto maintaining existing behavior.
So what should happen with:
Author: Google, Inc. Author-email: [email protected]
Just fyi if those are in the same entry/table then that wouldn't occur per PEP 621 https://github.com/pypa/packaging.python.org/issues/1134#issuecomment-1231564237
If you are parsing 2 entries represented like this (i'm using setuptools to bld):
maintainers = [
{name= "Google human"},
{email = "[email protected]"},
]
you get this (2 unique humans are maintainers:
Maintainer: Google human
Maintainer-email: [email protected]
if you do this:
maintainers = [
{name = "Google human", email = "[email protected]"},
{email = "[email protected]"},
]
you get this:
Maintainer-email: Google human <[email protected]>, [email protected]
Two name + email, one name only, one email only
maintainers = [
{name= "Google human", email = "[email protected]"},
{name = "Hans Lellelid", email = "[email protected]"},
{name = "Human three"},
{email = "[email protected]"},
]
Results in this:
Maintainer: Human three
Maintainer-email: Google human <[email protected]>, Hans Lellelid <[email protected]>, [email protected]
I suspect two things are happening:
If you have
- Two maintainers with associated emails two emails (example - sour mash - the HTML output looks like this where the entire string for both maintainers is turned into a
mailto:link. Here i'd expect pypi to parse each name as a unique name and each email associated in htat element in the list of maintainers to be associated with the unique name.
<p><strong>Maintainer:</strong> <a href="mailto:Luiz Irber <[email protected]>, "C. Titus Brown" <[email protected]>">Luiz Irber <[email protected]>, "C. Titus Brown" <[email protected]></a></p>
- If you have multiple maintainers and some have email others don't like this:
maintainers = [
{name = "Leah Wasser", email = "[email protected]"},
{name = "Hans Lellelid", email = "[email protected]"},
{name = "Jonatan Samoocha"},
{name = "Yihong"},
]
You end up with a pypi entry like this: Notice - that. here two of the maintainers are not listed. and BOTH have an email link that is a mixture of email and maintainer names similar to what you see with sourmash. i just fixed this by removing emails altogether and now test pypi just lists all 4 of our names.
I hope that is helpful. it just seems to me that things are being parsed differently depending on what combination of information is provided.
Coming from Issue #12877 (sorry for the duplicate Issue):
Paste of Issue 12877 content if useful for quick reference:
:wave: Hi. Our project pyhf just switched (c.f. https://github.com/scikit-hep/pyhf/pull/2095) from having our PyPI metadata in setup.cfg to pyproject.toml. In doing so, we also changed from having our author metadata for the 3 authors be across author and author_email to having it be contained in authors following PEP 621's requirements of
These fields accept an array of tables with 2 keys: name and email. Both values must be strings. The name value MUST be a valid email name (i.e. whatever can be put as a name, before an email, in RFC 822) and not contain commas. The email value MUST be a valid email address. Both keys are optional.
pip is recognizing all the metadata as we would expect
$ python -m pip show pyhf
Name: pyhf
Version: 0.7.1.dev43
Summary: pure-Python HistFactory implementation with tensors and autodiff
Home-page:
Author:
Author-email: Lukas Heinrich <[email protected]>, Matthew Feickert <[email protected]>, Giordon Stark <[email protected]>
License: Apache-2.0
Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages
Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm
Required-by:
However, when we published this to TestPyPI to check how things looked after switching over we noticed that TestPyPI is displaying only the first author and linking their email
Previously when we shoved all our names and emails into author and author_email we could at least have all our names be displayed (no surprise there as we were abusing the field)
I assume that this behavior with authors is because warehouse uses only the core metadata here (?) following PEP 621's instructions of:
Using the data to fill in core metadata is as follows:
- If only name is provided, the value goes in Author/Maintainer as appropriate.
- If only email is provided, the value goes in Author-email/Maintainer-email as appropriate.
- If both email and name are provided, the value goes in Author-email/Maintainer-email as appropriate, with the format {name} <{email}> (with appropriate quoting, e.g. using email.headerregistry.Address).
- Multiple values should be separated by commas.
Would it be possible for warehouse to display all authors information if it exists? Or is that something that is outside the scope of how warehouse interacts with metadata?
Describe the solution you'd like
Have warehouse be able to parse the existence of PEP 621 authors and display all names and associated emails of authors on the package webpage.
We (pyhf) are seeing a similar problem with our authors and maintainers fields in our PEP 621 compliant pyproject.toml.
Metadata from relevant wheel
$ python -m pip download --index-url https://test.pypi.org/simple/ --no-deps 'pyhf==0.7.1.dev35'
$ unzip pyhf-0.7.1.dev35-py3-none-any.whl
$ head -n 12 pyhf-0.7.1.dev35.dist-info/METADATA
Metadata-Version: 2.1
Name: pyhf
Version: 0.7.1.dev35
Summary: pure-Python HistFactory implementation with tensors and autodiff
Project-URL: Documentation, https://pyhf.readthedocs.io/
Project-URL: Homepage, https://github.com/scikit-hep/pyhf
Project-URL: Issue Tracker, https://github.com/scikit-hep/pyhf/issues
Project-URL: Release Notes, https://pyhf.readthedocs.io/en/stable/release-notes.html
Project-URL: Source Code, https://github.com/scikit-hep/pyhf
Author-email: Lukas Heinrich <[email protected]>, Matthew Feickert <[email protected]>, Giordon Stark <[email protected]>
Maintainer-email: The Scikit-HEP admins <[email protected]>
License: Apache-2.0
Authors
Our authors field is
authors = [
{ name = "Lukas Heinrich", email = "[email protected]" },
{ name = "Matthew Feickert", email = "[email protected]" },
{ name = "Giordon Stark", email = "[email protected]" },
]
and pip is recognizing all the metadata as we would expect
$ python -m pip show pyhf
Name: pyhf
Version: 0.7.1.dev43
Summary: pure-Python HistFactory implementation with tensors and autodiff
Home-page:
Author:
Author-email: Lukas Heinrich <[email protected]>, Matthew Feickert <[email protected]>, Giordon Stark <[email protected]>
License: Apache-2.0
Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages
Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm
Required-by:
though for our render check upload to TestPyPI we noticed that TestPyPI is displaying only the first author and linking their email
with the generated HTML of
<p><strong>Author:</strong> <a href="mailto:[email protected]">Lukas Heinrich</a></p>
Expectation / Desired Result
Have all of the authors have their name and emails be listed in a comma separated list according to the order they appear in the wheel metadata
$ grep "Author-email" pyhf-0.7.1.dev35.dist-info/METADATA
Author-email: Lukas Heinrich <[email protected]>, Matthew Feickert <[email protected]>, Giordon Stark <[email protected]>
with generated html of
<p><strong>Author:</strong> <a href="mailto:[email protected]">Lukas Heinrich</a>, <a href="mailto:[email protected]">Matthew Feickert</a>, <a href="mailto:[email protected]">Giordon Stark</a></p>
Maintainers
Our maintainers field is
maintainers = [ {name = "The Scikit-HEP admins", email = "[email protected]"} ]
and the TestPyPI render is
with the generated HTML of
<p><strong>Maintainer:</strong> <a href="mailto:The Scikit-HEP admins <[email protected]>">The Scikit-HEP admins <[email protected]></a></p>
Expectation / Desired Result
Have the maintainer name match the metadata of the wheel
$ grep "Maintainer-email" pyhf-0.7.1.dev35.dist-info/METADATA
Maintainer-email: The Scikit-HEP admins <[email protected]>
and be a hyperlink to the mailto
<p><strong>Maintainer:</strong> <a href="mailto:[email protected]">The Scikit-HEP admins</a></p>
I encountered this bug today. We define four authors, where we don't have an email address for one of them. Pypi.org decided to only show one of them, specifically the author without an email address, and used the email address of a different author as the mailto:-link π²
It seems to me like the core metadata specification is incompatible with the degree of freedom that PEP 621 promises.
For instance, how would you separate the following two cases? (click to expand)
# A PEP 621 project
[project]
# ...
authors = [
{ name = "Alice" },
{ email = "[email protected]"},
]
which would become:
Author: Alice
Author-email: [email protected]
and
# A "classic" project
setup(
# ...
author="Bob Bobbity",
author_email="[email protected]",
)
which would become:
Author: Bob Bobbity
Author-email: [email protected]
In the first case, you would expect the name in Author to be listed separately from the email in Author-email, meanwhile you would want the name in the second case to be combined with the email in Author-email. But there is no way to tell the two cases apart based on the core metadata alone.
The gap between PEP 621 and the core metadata specification can be closed in two ways:
- Restrict the freedom of the
authorsfield in PEP 621 (bringing it in line with the core metadata spec) - Address the shortcomings of the core metadata specification (bringing it in line with PEP 621) and update the mapping from PEP 621 to core metadata
Some thoughts on how you could add new fields Authors and Maintainers to core metadata to support the data model of PEP 621
Authors and Maintainers to core metadata to support the data model of PEP 621EDIT 2: I no longer think this is the best solution.
Possible solutions
If I were to come up with a "dream" solution, I would try to expand the core metadata specification with new fields, Authors and Maintainers. Note that they are plural, while the existing fields are singular. They would work exactly like Author-email and Maintainer-email, except you would be permitted to specify a name with no email address by using the same form as an email address with a name, but with the email address specified as an empty string. For instance: Alice <>.
To keep backwards compatibility with tools that don't know about the new fields, I would keep the algorithm described in PEP 621. However, tools that do know about the new fields should always disregard the old fields (Author and Author-email, or Maintainer and Maintainer-email) if the corresponding new field is present (Authors, or Maintainers). So the information in the authors and maintainers fields of pyproject.toml would be repeated twice in the core metadata: Once in the new field, and once in one of the old ones.
Here's what the example in PEP 621 would look like (click to expand)
The following definition in pyproject.toml:
[project]
authors = [
{name = "Pradyun Gedam", email = "[email protected]"},
{name = "Tzu-Ping Chung", email = "[email protected]"},
{name = "Another person"},
{email = "[email protected]"},
]
maintainers = [
{name = "Brett Cannon", email = "[email protected]"}
]
would be converted to the following core metadata:
Authors: "Pradyun Gedam" <[email protected]>, "Tzu-Ping Chung" <[email protected]>, "Another person" <>, [email protected]
Maintainers: "Brett Cannon" <[email protected]>
# For backwards compatibility
Author: Another person
Author-email: "Pradyun Gedam" <[email protected]>, "Tzu-Ping Chung" <[email protected]>, [email protected]
Maintainer-email: "Brett Cannon" <[email protected]>
I see that Maintainers was redundant in this example, since there is no confusion with Maintainer-email when everyone has an email address. So it's possible that the new field should only be used when there is at least one author/maintainer without an email address? But there's something to be said about being consistent.
The advantage of this approach is that you get the freedom to mix between authors with only a name, only an email, and both a name and an email address, in a way that is straight-forward to parse on the other end.
The downsides of this approach are that the new fields are easy to confuse with the old ones (since there's only a trailing s separating the two), and that information is repeated twice in the core metadata.
Alternatively, you could modify the definition of Author-email and Maintainer-email so that they may accept authors/maintainers without an email address, and use them for every author and maintainer when converting from PEP 621 (leaving out Author and Maintainer). But it feels a bit silly to put authors and maintainers without an email address inside Author-email or Maintainer-email. And tools out there may crash or behave weird if they were served Author-email: Alice <>?
EDIT: Put the "possible solutions" behind an accordion
EDIT 2: I no longer think the solution above would be the best one, there are simpler solutions.
I don't have a particular solution other than I think it would be great for someone to write a PEP that made this bit of metadata better :) There was even a recent thread on discuss.python.org where someone else had a related issue.
I just realised that this GitHub issue should probably be split into multiple ones.
The original issue description from @domdfcoding, and the use case from @matthewfeickert, are about the case where only Author-email (and/or Maintainer-email) is supplied. There is no confusion about what name goes with what email address in that case. According to the issue reporters, Pypi.org does not handle this properly. I would think it is possible to fix this so that all listed authors or maintainers are shown, using their names as the label and falling back to displaying their email address when no name is given. This would only require changes in warehouse.
The case where you are specifying multiple authors and mixing between Author and Author-email would be left unsupported and broken by design β just like today, in other words. If we wish to guide users towards the supported use case, we can add some guidance to the description in PEP 621 so that it recommends either including an email address for every author, or including no email addresses at all.
The issue of supporting a mix between email and non-email authors should be a different issue, I think. It would include the use cases reported by @lwasser and @pradyunsg, and me, (EDIT: and backwards compatibility with the existing usage brought up by @di) and would likely involve changes to the core metadata spec, PEP 621, warehouse and the build module.
I imagine this would take a while, so it makes sense to fix the simpler issue first and handle this more complex issue separately.


