perl5
perl5 copied to clipboard
incompatibility of package-block and __DATA__
When using package block syntax and __DATA__ in single file, __DATA__ doesn't belong to package
but to the main.
__DATA__ documentation states:
Text after __DATA__ may be read via the filehandle "PACKNAME::DATA",
where "PACKNAME" is the package that was current when the __DATA__ token
was encountered.
Technically it is correct, __DATA__ token is not specified inside package block (ie, package block prevents usage of __DATA__)
But with code like in example that may be misleading.
It may also harm adoption of syntax.
Example:
use strict;
use warnings;
package Foo {
print while <DATA>;
}
__DATA__
hello
world
Output
Name "Foo::DATA" used only once: possible typo at example.pl line 7.
readline() on unopened filehandle DATA at example.pl line 7.
All Perls v5.12 .. v5.40 are affected.
Proposal:
a) support __DATA__ inside package block (ie, it will be treated as } as well
b) treat insignificant content following } as content still belonging to the latest package block
I do not see that warning when using __DATA__, but I do see it when using __END__
I get
hello
world
as expected.
@mauke @Tux ouch, oops, I mentioned package block and typed example without it :-( ... editing issue
That is working as expected. The whole point of the package Foo { ... } syntax is that everything outside the braces is not part of the Foo package.
@mauke that I know, I mentioned that in technically part.
I also mentioned possible confusions and fact, that there is no alternative to attach __DATA__ to package except of old package syntax
there is no alternative to attach DATA to package except of old package syntax
I get that you might not like that aesthetically, but when is that a practical problem?
@Leont bad wording, likely to confuse
rest is only convenience - forcing inconsistent syntax across code base or, in case of class, weird syntax for class (especially if there is still hope of increasing usage of Perl)
On Fri, Sep 20, 2024 at 01:51:45AM -0700, Branislav ZahradnĂk wrote:
When using package block syntax and
__DATA__in single file,__DATA__doesn't belong to package
Example:
use strict; use warnings; package Foo; print while <DATA>; __DATA__ hello worldOutput
Name "Foo::DATA" used only once: possible typo at example.pl line 7. readline() on unopened filehandle DATA at example.pl line 7.
That example doesn't produce the output shown. I'm guessing the example was supposed to include a package block,
package Foo {
print while <DATA>;
}
which does produce the output you show.
Proposal:
a) support
__DATA__inside package block (ie, it will be treated as}as well
Absolutely not. DATA is currently straightforward and predictable. Why on earth would you want to start including special-cased tricksy parsing? Then you end up with a whole can of worms. What about nested packages, possibly intermixed with other blocks?
b) treat insignificant content following
}as content still belonging to the latest package block
Again, no, for the same reasons.
I can't see that this is a problem which needs fixing. The point of a block-scoped package declaration is that at some point in the file, you want to revert to the previous package. If you want the new package to be in scope to the end of the file, including the DATA token, then just leave the block off.
-- Wesley Crusher gets beaten up by his classmates for being a smarmy git, and consequently has a go at making some friends of his own age for a change. -- Things That Never Happen in "Star Trek" #18
In addition, I'd like to note I feel __DATA__ is too abused and overused. I recommend using here-docs or https://metacpan.org/pod/File::ShareDir rather than the global-variable-like-*DATA
@shlomif too doesn't mean that its usage isn't advocated as well
In addition, I'd like to note I feel
__DATA__is too abused and overused. I recommend using here-docs or https://metacpan.org/pod/File::ShareDir rather than the global-variable-like-*DATA
I wholeheartedly disagree. I use __DATA__ and __END__ a lot and I have never in my life used File::ShareDir nor did I ever feel the need for that.
Main reason is that scripts with __END__/__DATA__ hold that data within and are easy to copy to other hosts whereas external files need to be copied seperately with all possible isseus.
Here-docs are file for a single-use short piece of data, but when dealing with a list or longer data structures or text pieces to apply tests to (like: does this parse), here-docs only blur the code
My ⏠2.00
Everything here is working as expected, and I don't think there's anything worth changing. __DATA__ is inherently a file-based feature, and trying to make it interact with inner lexical scopes isn't going to work well. class isn't any different from package here, and can be written without an enclosing block.
If you need a block of data inside a scope, you can use a heredoc.
@haarg I will repair your sentence: "Everything here is working as implemented", because this issue is about to change "as expected".
No, I chose my words intentionally. Based on how these features work, doing something other that how they work right now would be very strange.
Yeah I donât see the issue here. The package is not in scope outside the block so the __DATA__ doesnât apply to it. Finding a way to make __DATA__ somehow get assigned to a package other than the one thatâs in scope would be awful.
@ap issue is consistency of language. Currently you can have either package-block or __DATA__ but not both.
Look at that from point of view of newbie in language used to package-block syntax from other languages.
There is no way to fix that (inasmuch as it even needs âfixingâ). __END__ and __DATA__ by definition consumes the rest of the file and package NAME BLOCK by definition needs a block-end marker. There is no way to put these features together, except by force of screwing up the design of at least one of them with some sort of craziness. This is a problem only if you impose on yourself a rule to always use package NAME BLOCK.
@ap as they say "never say never" - I put two proposal into issue
Possible implementation of proposals:
a) support DATA inside package block
- allow indented
__DATA - such data must match
qr ( \} \s* $ )x, which will be removed by lexer and lexer (edit: finish sentence) will doyyl_data_handleand returnPERLY_BRACE_CLOSEbeforeyyl_fake_eof
current perly.y
PERLY_BRACE_OPEN remember stmtseq PERLY_BRACE_CLOSE
new perly.y
PERLY_BRACE_OPEN remember stmtseq optdatablock PERLY_BRACE_CLOSE
b) treat insignificant content following } as content still belonging to the latest package block
I don't have exact implementation but idea:
Perl_packagecurrently setsPL_curstash- extend it with storing
effective stash
- extend it with storing
- at the end of block,
effective stashwill be preserved yyl_data_handlecurrently usescurrent stash- will useeffective stashinstead- yylex, when returning token, will assign current stash to effective stash
there is no alternative to attach DATA to package except of old package syntax
Indeed, here's how one can tell Perl which package the __DATA__ section belongs to, all the while using package blocks:
use strict;
use warnings;
package Foo {
print while <DATA>;
}
package Bar {
# There's only one DATA handle per file,
# and it's in the other package,
# as we'll soon discover...
}
package Foo; # everything that follows belongs to package Foo
__DATA__
hello
world
Output:
hello
world
Currently you can have either
package-blockor__DATA__but not both.
I think the above showed an example having both a package-block and a __DATA__ section.
package BLOCK declares the package of the block that follows, a package line declares the package of the lines that follow.
__DATA__ is inherently a line-based structure ("from this line on..."), like the old package statement and several others (__END__, # line...). So it doesn't feel very inconsistent to me to have to use some line-based syntax to ensure which package applies to it.
Look at that from point of view of newbie in language used to package-block syntax from other languages.
I don't know of other languages that have an equivalent of the __DATA__ feature. (That is likely a sign of my own ignorance; I'd be happy to learn about them!)
removed as still out of point of issue - issue is not about how it works now, but how it should work better
Look at that from point of view of newbie in language used to package-block syntax from other languages.
I don't know of other languages that have an equivalent of the
__DATA__feature. (That is likely a sign of my own ignorance; I'd be happy to learn about them!)
re other language with __DATA__ - for example Ruby has __END__, PHP has __halt_compiler
re newbie (explained):
- Perl has strong concept 1 package 1 file (yes, you can mess ...)
- pattern
symbol-space BLOCKis more common across programming languages:- typescript:
namespace Foo { } - java:
class Foo { } - ruby:
module Foo ... end - C++:
namespace Foo { }; class Foo { }
- typescript:
as they say "never say never" - I put two proposal into issue
âNeverâ still applies. You didnât put the features together, you proposed two possible ways to screw up the design of either one of them with some sort of craziness⊠just like I said.
The whole point of __DATA__ is that it gives you a file handle. If you make the feature do some kind of post-processing of the content so that the file handle no longer actually directly accesses the file on disk, then the feature is pointless and anything you would want to do with it can be done with just a heredoc. We have had indentable heredocs for a while. If that is what you are asking for then your answer is âuse a heredoc for thatâ.
And âletâs break the basic concept of how scopes work just for this one obscure super-special caseâ is not a serious proposal. Your list of languages is nice and handy here, go ahead and show me a construct in any of them where a scope consumes things that come after the end of the scope.
You have a problem with the language because of a self-imposed stylistic preference. The solution is not to change the language but to fix your stylistic preference.
re other language with
__DATA__- for example Ruby has__END__
Not a useful example. Ruby only has a single global DATA handle, nothing like Perlâs per-package handles. If you have __END__ in multiple files, only one of those will actually be accessible. It certainly doesnât try to allow __END__ inside a scope.
PHP has
__halt_compiler
This is even more primitive. It just causes a constant __COMPILER_HALT_OFFSET__ to be defined, and you get to open the file and seek to that offset yourself. (Though this design does make it even more explicit that the file handle is the entire point of the exercise.) I think this constant is per-file but Iâm not sure. Anyway PHP doesnât do anything special to address putting this directive inside a scope, either.
re other language with
__DATA__- for example Ruby has__END__Not a useful example. Ruby only has a single global
DATAhandle, nothing like Perlâs per-package handles.
doesn't what you say applies for combination of package-block and __DATA__ ? then it is always single global main::DATA ... and that's point raised by this issue.
it is always single global
main::DATA
There's as many DATA handles as there are packages, and files to hold them.
it is always single global
main::DATAThere's as many
DATAhandles as there are packages, and files to hold them.
I admit I'm not native speaker but as this pair of sentences really so badly formulated ? (if so, I'd ask someone to play devil's advocate and re-write my comments little bit better)
doesn't what you say applies for combination of
package-blockand__DATA__? then it is always single globalmain::DATA... and that's point raised by this issue.
Yes, if you write your multiple Perl code files in such a way that their __END__ markers are all in package main, then only one of them will become main::DATA.
So donât do that.
Since you can't stick a __DATA__ inside curlies, how about a namespaced __DATA__ tokens?
package Foo {
sub lines { <DATA> }
}
Foo::__DATA__
first line, index 0
second line, index 1
Since you can't stick a
__DATA__inside curlies, how about a namespaced__DATA__tokens?
That is interesting idea as well.
there is also possibility of new handle, eq FILE_DATA
or even better capablity to turn DATA into lexical (... my earlier poc will become handy, https://github.com/Perl/perl5/pull/22850)
On Mon, Apr 7, 2025 at 10:57âŻAM guest20 @.***> wrote:
Since you can't stick a DATA inside curlies, how about a namespaced DATA tokens?
package Foo { sub lines { <DATA> } }
Foo::__DATA__first line, index 0second line, index 1
â Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/22613#issuecomment-2783634744, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMARDFI7CL5QZR4YW2JEXL2YKG6BAVCNFSM6AAAAABORSND42VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTGYZTINZUGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: guest20]guest20 left a comment (Perl/perl5#22613) https://github.com/Perl/perl5/issues/22613#issuecomment-2783634744
Since you can't stick a DATA inside curlies, how about a namespaced DATA tokens?
package Foo { sub lines { <DATA> } }
Foo::__DATA__first line, index 0second line, index 1
As noted earlier in the thread, this can already be done by prefixing the
line with package Foo;
-Dan
package NAMESPACE BLOCKsyntax reverts to the previous package after the block ends. TheBLOCKmust be properly terminated.__DATA__stops parsing the file and leaves the file handle being read from open and stores it in*DATAin the currently active package. Syntactically, it is the equivalent to the end of the file.
Neither of these things are going to change, even if it means these two features don't usefully interact with each other.
Something involving a lexical file handle would have to work significantly differently from __DATA__. You are welcome to propose something like that if you have a design for it.