better api for encoded input
Ether summarized this in IRC just now and it seems perfectly fine and i'll need to do it asap.
14-01-22@02:41:35 (@ether) Mithaldu: basically, PPI needs separate interfaces for new_from_file, new_from_handle, and new_from_content -- for the first, you need a separate encoding argument; for the second, you should force the caller to apply the right layers to the $fh in advance; for new_from_content, presume characters (decoded)
Of the three of these, the only one that's really required is new_from_content (the user can always slurp the file contents himself), but the others will make a lot of things more efficient. e.g. in Dist::Zilla, sometimes we have the encoded_content cached in a string, in which case opening a filehandle to a scalarref of that would be more preferable than decoding the entire string and passing it to PPI (which may only be interested in parts of the document anyway).. and reading line by line from a handle will be more resource-friendly than slurping the entire file.
(Module::Metadata is plagued by similar issues - it has new_from_handle and new_from_file, but is ignorant of encoding issues, so currently we need to deal with wide chars by encoding the content and passing a filehandle to that encoded content with the right layers set. What a mess!)
You'd probably also want a new_from_module too, which incorporates %INC resolution logic... :)
related to https://rt.cpan.org/Ticket/Display.html?id=27121 ?
related to https://rt.cpan.org/Ticket/Display.html?id=48265
related to https://rt.cpan.org/Public/Bug/Display.html?id=86299
More elaboration:
- PPI::Node objects need an interface for fetching encoded content, and decoded content (requires awareness at the document level what encoding the content is in)
- PPI::Token::Pod objects need an override for this, which uses the value of any preceding
=encodingpod directive (so when parsing the document, we'll need to take note of the=encodingdirective and record that at the top-level pod-containing node).