body-parser icon indicating copy to clipboard operation
body-parser copied to clipboard

iso-8859-1/windows-1252 support?

Open papandreou opened this issue 9 years ago • 14 comments

Hi,

A while ago I developed a web app to replace an old cgi-bin script from the nineties, which means that it had to assume that POST requests with a Content-Type of application/x-www-urlencoded are in iso-8859-1 (and also support "smart quotes" etc., aka. windows-1252). At the same time it had to be possible to switch to utf-8 mode.

I accomplished that by adding support for a defaultCharset option, which can either be utf-8 or iso-8859-1, and for explicitly switching to utf-8 mode by adding utf8=%E2%9C%93 (utf8=✓ as percent-encoded utf-8 octets) to the query string, a trick that I've seen others use as well.

Here are the diffs between my forks of body-parser and the qs module:

https://github.com/expressjs/body-parser/compare/master...papandreou:iso-8859-1 https://github.com/ljharb/qs/compare/master...papandreou:interpretNumericEntities

Would anyone be interested in this work? If so, I can clean it up and open PRs.

papandreou avatar Aug 23 '16 07:08 papandreou

@dougwilson Any update on this ?

I'm interested in this feature but as @papandreou 's branch is way behind the official master. Seems like it's going to be hard to pull...

Is there any proper other way to handle ISO-8859-1 encoded data ?

Xtrem65 avatar Jul 16 '18 13:07 Xtrem65

Hi @xtrem65 I haven't even commented on here, let alone done any work on it :) I don't have any updates.

dougwilson avatar Jul 16 '18 13:07 dougwilson

@Xtrem65, I guess the only update is that you're expressing interest, which is also a good thing :)

If @dougwilson and @ljharb think the work looks good and they want to adopt this feature, I'd be willing to bring the branches up to date.

papandreou avatar Jul 16 '18 14:07 papandreou

@Xtrem65, have you tried switching to my body-parser-papandreou fork to see if it solves your problem? It should still work with current versions of express.

papandreou avatar Jul 16 '18 14:07 papandreou

I’d be happy to review a PR.

ljharb avatar Jul 16 '18 14:07 ljharb

@ljharb, that sounds great. Before I try to rebase it on qs master, does this approach look good? https://github.com/papandreou/qs/compare/9250c4cda5102fcf72441445816e6d311fc6813d...interpretNumericEntities

papandreou avatar Jul 16 '18 15:07 papandreou

I'd need to review it when rebased :-) it looks like a good direction tho.

ljharb avatar Jul 16 '18 20:07 ljharb

Rebased branch: https://github.com/papandreou/body-parser/tree/feature/iso-8859-1/take2

Requires https://github.com/ljharb/qs/pull/268 to be npm linked in.

papandreou avatar Jul 16 '18 22:07 papandreou

Wow guys ! so much reactivity :D

I'd be happy to help if needed

Xtrem65 avatar Jul 17 '18 08:07 Xtrem65

@Xtrem65, you can help by trying it out :)

  1. Make a clone of https://github.com/papandreou/qs
  2. Check out the feature/iso8859-1 branch
  3. npm link
  4. Make a clone of https://github.com/papandreou/body-parser
  5. Check out the feature/iso-8859-1/take2 branch
  6. npm link qs
  7. npm link
  8. cd to your project
  9. npm link body-parser

... Then see if you can solve your problem. Depending on the requests that you need to support, you might have to specify the defaultCharset option.

papandreou avatar Jul 17 '18 09:07 papandreou