html icon indicating copy to clipboard operation
html copied to clipboard

autocomplete attribute for street-address details

Open battre opened this issue 5 years ago • 33 comments

Currently the HTML Standard seems a bit US centric:

The autocomplete attribute supports the street-address type for an unstructured street address or address-line1, address-line2 and address-line3 for the respective lines.

In Germany and several other countries, websites typically ask for the street name and a house number. In Spain, websites ask for the floor number and door number (I am not a Spanish speaker, I have seen "Num.", "Piso", "Letra", "Esc." on simyo.es for example). Some Brazilian sites ask for a neighborhood (which we might express with an address-levelX but that might be a bit underspecified because I am not aware of a canonical mapping of entities in specific countries).

Should we start a discussion on defining more entities around addresses? I think that just by supporting a street-name, house-number and apartment-number, we could cover a lot of ground.

battre avatar Oct 08 '19 08:10 battre

cc @mnoorenberghe

annevk avatar Oct 08 '19 09:10 annevk

Chrome seems to somehow fill it correctly sometimes, see https://i.imgur.com/QRNxfXL.png (segmueller.de).

This is very rare though and probably depends on the ID's given. Usually, I'd say with a 90% probability, the "street" field from forms gets filled with the street name and the house number. The house number field is then empty and I have to manually edit it, which is unfortunate and could be fixed via the spec.

@annevk / maintainers: Do you need any example websites to show/clarify the issue?

carstenhag avatar Oct 13 '19 07:10 carstenhag

In Japan, we don’t even have street addresses. Instead, the equivalent is a series of numbers that starts with a district number, and then a block number, and then what’s nominally a specific building number.

I say “nominally” because it’s common for several buildings to actually have the same “building number” (due, e.g., to cases where an older, larger building on a piece of land gets demolished and replaced with two or more smaller buildings on the same land).

So in Japan, in addition to a “building number”, it’s often also necessary to also specify a building name.

When the autocomplete attribute was being defined, I remember we specifically discussed the Japanese-addressing case — as well as other locales/cases with addressing schemes that don’t use street names or that need something more specific than a street address — and (as far as I recall), the relevant set of tokens now in the spec were decided on because trying to define a richer set of tokens to reflect all the possible cases/locales would have ended up with a set that was just too unwieldy in practice.

sideshowbarker avatar Oct 14 '19 02:10 sideshowbarker

@carstenhag I think we mainly need to hear from implementers to what extent they are interested in putting work into this as this would also require UI that's aware that if you change the country the address format changes. (E.g., Firefox only supports a single address field at the moment as far as I can tell.)

annevk avatar Oct 14 '19 07:10 annevk

Thank you for sharing the historic context. I appreciate that there is a lot of complexity in international addresses that I am not aware of.

I wonder whether we can still come up with a practical solution that addresses the problem that websites put street and house number into separate fields. We have looked at a few hundred websites in Brazil, Spain, France, Germany and India (focused on non-US websites; acknowledging that this is by no means a representative sample) and noticed this challenge quite frequently in all of these countries but India.

Assuming that websites do not change their UI to support the autocomplete attribute, I wonder how we feel about adding special cases for "a lot but not all countries". I would propose that addresses are too complex to cover all countries with all nuances so we can only get closer to "correct" but never reach it 100%.

Another syntactic idea I had was to go with sub-attributes. Something like address-line1[street-name] and address-line1[house-number]. Would you feel different about this? This could also be used for https://github.com/whatwg/html/issues/4987 as a format specifier cc-exp[mm/yy] to express an expiration in format "MM/YY".

@annevk Regarding implementer interest: I am working for Google on Chrome.

battre avatar Oct 14 '19 07:10 battre

@rmondello Do you know anybody who works in this space on WebKit?

battre avatar Oct 14 '19 07:10 battre

cc @whatwg/i18n

annevk avatar Oct 14 '19 07:10 annevk

FWIW, this may also affect the purpose attribute in APA's Personalization work.

xfq avatar Oct 14 '19 08:10 xfq

cc @whatwg/a11y per above comment. Does make me wonder how many attributes we need to state the same thing.

annevk avatar Oct 14 '19 08:10 annevk

@rxaviers and I were chatting about internationalized address formatting. I wonder if the JavaScript Intl API could help by giving data about the required fields for a locale. @sffc

littledan avatar Oct 14 '19 08:10 littledan

Given the role autocomplete has in WCAG SC 1.3.5: Identify Input Purpose, I am a fan of any new values that are more internationalized and can benefit more users.

From the user perspective, as long as browsers support it with auto-fill values, then all good.

From the dev perspective, to @sideshowbarker's comment above, too many options and we can expect many devs to use the wrong ones, blunting the accessibility benefit.

aardrian avatar Oct 14 '19 14:10 aardrian

Postal addresses are famously complicated to internationalize, due to the wide variation in formats between (and sometimes within) countries. In addition, there can be differences in what application authors prefer (in terms of the level of granularity they wish to store/process/validate).

There are several ways this might be addressed. On the one hand, we could try to enumerate all postal address components globally such that page authors can always specify the component they mean--door number, prefecture, administrative unit, floor number, etc. As noted by @xfq and @aardrian this could be of use to assistive technologies. On the other hand, we might try to address only specific problems with additions (at a disadvantage to users in countries that need something different).

A key thing to notice is that the language/locale of the page is not the same thing as the country of an address and the form used to collect a specific address needs to be tied to the country it ships to.

@littledan This has less to do with locale than it does with country/region (although there is some locale influence). I would suggest that postal address parts and their regional association be suggested to CLDR as an addition. Intl could then leverage this.

I'll bring this up at W3C I18N WG's next telecon, but we're off this week due to IUC43.

aphillips avatar Oct 14 '19 16:10 aphillips

speaking of WCAG's repurposing (perversion) of the attribute...there are certain situations where an input accepts two different types of information (such as "Username or email address") that currently can't be expressed I think? would it be a massive complication allowing these sorts of multiple values to be used (in order of preference, perhaps)?

patrickhlauke avatar Oct 14 '19 17:10 patrickhlauke

@patrickhlauke see #4445.

annevk avatar Oct 15 '19 05:10 annevk

Some quick comments here, although it looks like there's a lot of good discussion on the key points already...

Another syntactic idea I had was to go with sub-attributes. Something like address-line1[street-name] and address-line1[house-number]. Would you feel different about this?

FWIW I would prefer address-line1-street-name and address-line1-house-number.

From the user perspective, as long as browsers support it with auto-fill values, then all good.

Indeed. I think that's the key point, is whether there are actors in the ecosystem who would actually leverage this. It sounds like Chrome might take the steps @annevk describes (i.e., change their "user information" UI to have separate stree-name/house-number fields when you're in the given country, so that it can successfully autofill later).

Given the precedent in https://github.com/whatwg/html/issues/3745#issuecomment-487088177, we allow fairly liberal implementer support signals for expanding the autofill vocabulary. So I think the main goal of the discussion here is to gather input from everyone (as this thread has been doing) to make sure the design is reasonable, even if only one browser has immediate plans to do the UI work necessary.

I think this also speaks to the question of how much work we want to do in creating more autofill tokens for more classes of addresses, e.g. Japanese addresses. The answer, IMO, is as much work as the ecosystem wants to put in. Not just browsers, either. E.g. if extensions, or AT vendors, or similar communicated that they could serve a good number of users by introducing such new autofill tokens, then I think the spec should be a reasonable clearinghouse for coordinate those efforts, and having design discussions among multiple parties to shake out any cross-cutting issues.

domenic avatar Oct 15 '19 07:10 domenic

Please excuse my rambling…

Do we have any evidence that the sites that currently use separate fields for these values actually require them to be separate fields? Or is it just a local preference?

@carstenhag I think we mainly need to hear from implementers to what extent they are interested in putting work into this as this would also require UI that's aware that if you change the country the address format changes. (E.g., Firefox only supports a single address field at the moment as far as I can tell.)

We show a single address-line textarea in preferences but can split it into up to 3 lines when filling. Likewise for the phone number field, we show one box but can split into many different combinations of <input> when necessary due to all the different tel-* tokens (we don't support all of them).

Another syntactic idea I had was to go with sub-attributes. Something like address-line1[street-name] and address-line1[house-number]. Would you feel different about this?

FWIW I would prefer address-line1-street-name and address-line1-house-number.

I'm guessing the reason for the square bracket syntax was to allow it to work with address-lineN with N from 1 to 3. Would you want to instead add all 6 tokens since I don't think these components are always on line 1?

Indeed. I think that's the key point, is whether there are actors in the ecosystem who would actually leverage this. It sounds like Chrome might take the steps @annevk describes (i.e., change their "user information" UI to have separate stree-name/house-number fields when you're in the given country, so that it can successfully autofill later).

The UI change is the easiest part to deal with… the harder part is being able to convert in both directions between address-lineN and the subdivisions being proposed. We dealt with a similar problem with less complexity for the different tel-* tokens and it wasn't nice… I think we still don't support all of the tel-* tokens as a result of the complexity. The reason you need to convert in both directions is because the user could have first saved that address-line as one field but needs to autofill in the separate components (and vice versa). I think it will be very hard for UAs to handle that without creating duplicates but other than somehow pushing sites to move away from these fields (unless of course they need them for shipping calculations or something like that), I don't see how we can nicely solve this problem. I'm skeptical that that Intl APIs could even be defined to do the transformations in both directions.

My main concern with this proposal is that it could make the use of these narrower fields more popular in the future even if not all UAs can handle them properly. Can we add them to the spec but mark it deprecated from the beginning? :P That would be similar to how we have https://compat.spec.whatwg.org/ where we standardize the web as it is, not as we want it (I realize this isn't a perfect analogy since sites probably aren't using these specific tokens now).

The spec already has the following text which is relevant but I'm not sure most people would notice it:

Generally, authors are encouraged to use the broader fields rather than the narrower fields, as the narrower fields tend to expose Western biases. For example, while it is common in some Western cultures to have a given name and a family name, in that order (and thus often referred to as a first name and a surname), many cultures put the family name first and the given name second, and many others simply have one name (a mononym). Having a single field is therefore more flexible.

Maybe we should expand on that and explicitly annotate which autocomplete tokens should be avoided in favour of broader ones? Also, in this case I don't think "Western biases" is applicable.

I'm very interested in hearing how other UAs plan to handle the bidirectional data transformation issue… I'm also interested in hearing arguments for why we should codify this pattern rather than leave it up to UA heuristics to figure out (which is the status quo). Why do we want to pave this cow path rather than encourage change?

P.S. I wonder if authors ever handle this by listening for insertReplacementText input events on the address-lineN field and moving the appropriate sub-components to their own fields…

mnoorenberghe avatar Nov 09 '19 02:11 mnoorenberghe

What is the status of this ticket?

We as a company use autofill options for our forms. We use postcode combined with house number (optional house number extension) to autocomplete the street and city data. This is for dutch websites. The service for autocompleting we use is https://pro6pp.nl/en.

I would like separate attribute for house number and house number extension

JohJohan avatar Dec 24 '20 16:12 JohJohan

It's an old and closed thread at this point but I was researching specifically address autocomplete options and came across this thread. I've also checked the linked thread and googled some more, but can't really find a better place to add my 2 cents, so I decided to add it here.

International addresses are complex in that each country or region requires their own details. However, the people living in these countries and regions generally know and expect these details.

By keeping the autocomplete options "generic", you do support the use-case where a single form has to serve a wide array of people, but you don't support the use-case where you specifically want your forms to match the expected input values for a region.

This could be to localize your form and thus appeal to the target population, it could be because your employer only works in that region and expects it to work that way or it could be because you want to validate the address according to region-specific business rules and you need structured input (as defined by that region) to do that.

We musn't forget that forms have existed for many years and region-specific details have always existed and were never an issue, but now you are basically giving developers a choice:

  • adapt your form to a more generic version
  • skip autocomplete alltogether

Neither is really the point of a specification like this. Because I can't express what I need with the specification, I'm relegated to trying to "game the system" and check browser specific implementations.

In our neck of the woods we generally ask for the street name and the street number in separate input fields. In the case of an apartment building or the like, we also need a further specification as the street number only indicates the building as a whole. The additional bit (like the floor number) can sometimes be added to the number field or it can be asked in a separate third field.

nablex avatar Aug 03 '21 18:08 nablex

We (Chrome) have reached a point where we would love to invest in extending the autocomplete attribute to better reflect the regional ways requesting address information.

I have written up an explainer document in a personal github repository: https://github.com/battre/autocomplete-attribute-explainer/tree/main

The core points are:

  • We would like to invest in a more inclusive way to specifying address structures that meet the requirements in countries that are currently not well served by the specification.
  • We now have some data that indicates the frequency of more structured address representations. We looked into street names and house number fields (so this is a non-exhaustive analysis) and found that these are actually quite common in many countries (e.g. in Germany we estimate that roughly 1/3 of address forms to contain a street name field, in Brazil we estimate roughly half of address forms to contain a house number field). Overall, the data looks compelling to us to implement better regional support.
  • I have made a proposal for integrating more structured addresses into the autocomplete architecture. The proposal would be to ask the site author to either rely on street-address/address-lineX or on the new more fine-grained field types.
  • I have made a couple of proposals for which field types to add, based on what we have observed in the wild.
  • I believe that there are reasonable ways for browsers to ask the user for structured and unstructured representations for their addresses.

I'd love to hear your feedback:

  • Is this the right thing to build?
  • Is the presentation reasonable?
  • Is this now sufficiently inclusive? What are the gaps? I expect that my view is still biased even though we have made an effort to look into several countries.

battre avatar Jan 28 '22 15:01 battre

@TGiles

smaug---- avatar Feb 03 '22 18:02 smaug----

Hey all, sorry if this was already answered somewhere.

Do we have the list of official sources of address formats in different countries? We have a little over 200 countries on the planet, each has their official post office with their official address standards.

Something like this:

Canada Addressing guidelines USA Postal Addressing Standards

galich avatar Mar 03 '22 18:03 galich

I found this: https://en.wikipedia.org/wiki/Address 'Format by country and area'

JohJohan avatar Mar 03 '22 18:03 JohJohan

Google's Address Data Service is still the main source of address format data for most libraries/packages.

bojanz avatar Mar 03 '22 19:03 bojanz

This link may also be useful to the discussion. It lists all countries and groups by similar formats. https://www.grcdi.nl/gsb/world%20address%20formats.html

howard-e avatar Mar 10 '22 21:03 howard-e

Thanks for all these links. I think think that they are important to double check.

On top of that, I think that a core point is to look at what websites actually request in forms and how. These guidelines don't mention that websites in some countries ask for a street name and house number in separate fields.

battre avatar Mar 14 '22 09:03 battre

You should also use care when looking at "what websites actually request in forms and how", because i think you'll find plenty of sites that tried to shoe-horn things into badly internationalised frameworks, and other sites may not necessarily be following the best design principles.

Fwiw, here are a couple of reservations i have on this topic Essentially, i think that just because a percentage of addresses contain both a house number and a street name it doesn't necessarily follow that you need to split those apart on a form. Doing so introduces other problems. For example, as people mentioned above, in the place i previously lived my address began with something like this: 6, MyBuilding Name, 24a MyStreet Road. It was often problematic to know whether to put the 6 (the most important of the 2 numbers, since the building was much better signposted than the house number) in the house number field or the 24a; but whichever, it left a bunch of stuff dangling. If i put the house number, followed by the street name that would be completely different from the way i'd normally write my address. If i put the flat number, then building name, and put the house number + street name on another line, i'd be worried in case the house number was mistaken (as at least once, when we had mail sent to no. 6 on our street). It was much easier if there were a couple of lines available, which both autofilled, and i could fill them as i wanted. There are also plenty of addresses where a house number isn't relevant. In both of these use cases, a simple general field that can be filled in (an autoprompted) by the user as they prefer makes life easier. It also makes it easier to localise the layout of form fields: for example, the house number comes after the street name in Switzerland. Separating house number and street name into different fields then causes extra work for localisation of the UI, whereas simple line fields allow the user to type in whatever order or details they need. Adding this separation may also encourage more UI designers to think that they perhaps should be separating the fields, if there's functionality behind it.

So my worry is that rather than being helpful, this may turn out to be overprescriptive because a better design pattern for UIs would be to not separate out the house number.

Btw, fwiw, typing in your house number seems to be increasingly uncommon in the UK these days, when ordering things online. You generally type in your postcode, and then select the house number and street from a popup list.

r12a avatar Mar 15 '22 13:03 r12a

I keep flipping back and forth between "This is a good idea" and "This is trying to solve the wrong problem". Making autocomplete better is something I want to see, but I guess I'm confused about this proposal. It sounds like street-address should be able to cover house-number, apartment-number, etc, already but is not due to forms that specify too many fields or too specific fields; please correct me if I'm wrong though. It does seem like this extension to the spec may help given the metrics in the explainer doc, but I wonder if there's a better way.

The last two bullet points of the "Implementation by browser" section in the explainer document would be something worth exploring I think. I don't think we'll be able to keep pace with all the different conventions and styles of addresses in the world, but if the browsers can learn what data should be filled or how this data should be associated...I think that's something that would be extremely beneficial. Learning how to autocomplete where the user is, that's the endgame in my opinion. Has any one else thought of something like this?

Barring that local learning part, extending the street-address might be a good enough solution. I do want to second mnoorenberghe's concern of, "My main concern with this proposal is that it could make the use of these narrower fields more popular in the future even if not all UAs can handle them properly". I guess knowing more about why websites are using such narrow fields would be beneficial for me (and maybe I've missed that in this thread, feel free to point me in the right direction. I do see nablex's point though.) because again, it sounds like street-address already covers the house-number, apartment-number case.

TGiles avatar Mar 28 '22 20:03 TGiles

I've read through the proposal and it seems to address all my prior concerns so I would love to see it implemented!

I think it's important to stress this part:

They choose to use these formats despite a lack of support by the autocomplete attribute, so we don’t expect that nudging website authors towards an address representation with address-line1, 2, 3, is likely to succeed

Adresses go far beyond websites and browser-based autocomplete, those local preferences are ingrained in the software already running in any given corporation, from CRMs to ERPs to custom solutions. They are reinforced every time a new application is built that needs to integrate with those existing systems. Not offering an autocomplete feature does not change this reality, it just makes it harder for developers to deal with it.

As a developer, I have spent more time actively disabling or circumventing autocomplete than using it. As a user I have been frustrated time and again as the autocomplete gets it wrong, requiring me to manually correct the form fields anyway.

I don't really see the downside to enabling more granular autocomplete options, there seem to be two possibilities:

  • User agents don't support the new fields so they will remain blank (which -as a user- is preferable to incorrect data)
  • User agents do support the new fields, at worst users need to fill in their address a couple of times in different formats before the user agent knows all the autocomplete variations

I hope this addresses the repeated concern:

My main concern with this proposal is that it could make the use of these narrower fields more popular in the future even if not all UAs can handle them properly

The key takeaway is: they are already popular because they are necessary.

nablex avatar Apr 23 '22 08:04 nablex

Apologies for the radio silence while I was gathering more data.

To address the question "I guess knowing more about why websites are using such narrow fields would be beneficial for me" here is one answer that I got from a major German company (we reached out to some companies and await more responses):

There are several answers and reasons for this, we´ll have to keep in mind, that the decision to separate these fields isn´t always made consciously, often historically inherited.

Some reasons belong to database design; where street names are CHAR, and house numbers are NUM and VARCHAR was expensive or unavailable

Some reasons belong to requirements of the printer programs for printing the address labels: these required the separation described

I also learned that the metrics presented in the explainer document are an underrepresentation. We found more examples of real websites that ask require a more structured representation and I think that some of these are not reflected in the numbers, yet. For example:

  • For Australia we found "street name", "street number" and "unit number"
  • For Italy we found "Via/Piazza", "Numero Civico", "Informazioni aggiuntive (interno, scala...)"
  • For Poland we found "Ulcia", "Numer domu", "Numer mieszkania"
  • For Brazil we found "Endereço", "Numéro da residencia", "Complemento (opcional) Apartamento, sala, conjunto, edificio, andar, etc."
  • For Mexico "Calle y número exterior", "Número exterior"

The very generic catch-all items ("Informazioni aggiuntive (interno, scala...)" = "Additional information (interior, staircase ...)"; "Complemento (opcional) Apartamento, sala, conjunto, edificio, andar, etc." = "Complement (optional) Apartment, room, set, building, walk") may require some further tuning in the proposal.

battre avatar May 03 '22 09:05 battre

Some reasons belong to database design; where street names are CHAR, and house numbers are NUM and VARCHAR was expensive or unavailable

There's an example of a pattern we probably don't want to prolong the life of, since i assume that if i type in a house number such as 25a (a format i have had for more than one place i lived), things will break. Also, what about 8-12 My Street?

We found more examples of real websites that ask require a more structured representation and I think that some of these are not reflected in the numbers, yet.

It would be good to also gather data from some non-Latin script languages and those that are not of European origin. Japanese and Chinese would be a good start. See the example at https://en.wikipedia.org/wiki/Address#Japan, where the line 東京都文京区白山4丁目3番2号 includes country name, (prefecture name), city name, ward and decreasing subdivision names. Note the absence of a street name. I think a common form format for inputting Japanese addresses is likely to ask for the user to type 4-3-2 where we would have just a house number.

That page has informal data on address formats for around 65 countries.

r12a avatar May 03 '22 11:05 r12a