perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

incompatibility of package-block and __DATA__

Open happy-barney opened this issue 1 year ago ‱ 14 comments

When using package block syntax and __DATA__ in single file, __DATA__ doesn't belong to package but to the main.

__DATA__ documentation states:

Text after __DATA__ may be read via the filehandle "PACKNAME::DATA",
    where "PACKNAME" is the package that was current when the __DATA__ token
    was encountered.

Technically it is correct, __DATA__ token is not specified inside package block (ie, package block prevents usage of __DATA__) But with code like in example that may be misleading. It may also harm adoption of syntax.

Example:

use strict;
use warnings;

package Foo {
    print while <DATA>;
}

__DATA__
hello
world

Output

Name "Foo::DATA" used only once: possible typo at example.pl line 7.
readline() on unopened filehandle DATA at example.pl line 7.

All Perls v5.12 .. v5.40 are affected.

Proposal:

a) support __DATA__ inside package block (ie, it will be treated as } as well b) treat insignificant content following } as content still belonging to the latest package block

happy-barney avatar Sep 20 '24 08:09 happy-barney

I do not see that warning when using __DATA__, but I do see it when using __END__

Tux avatar Sep 20 '24 08:09 Tux

I get

hello
world

as expected.

mauke avatar Sep 20 '24 10:09 mauke

@mauke @Tux ouch, oops, I mentioned package block and typed example without it :-( ... editing issue

happy-barney avatar Sep 20 '24 10:09 happy-barney

That is working as expected. The whole point of the package Foo { ... } syntax is that everything outside the braces is not part of the Foo package.

mauke avatar Sep 20 '24 10:09 mauke

@mauke that I know, I mentioned that in technically part.

I also mentioned possible confusions and fact, that there is no alternative to attach __DATA__ to package except of old package syntax

happy-barney avatar Sep 20 '24 10:09 happy-barney

there is no alternative to attach DATA to package except of old package syntax

I get that you might not like that aesthetically, but when is that a practical problem?

Leont avatar Sep 20 '24 11:09 Leont

@Leont bad wording, likely to confuse

rest is only convenience - forcing inconsistent syntax across code base or, in case of class, weird syntax for class (especially if there is still hope of increasing usage of Perl)

happy-barney avatar Sep 20 '24 11:09 happy-barney

On Fri, Sep 20, 2024 at 01:51:45AM -0700, Branislav ZahradnĂ­k wrote:

When using package block syntax and __DATA__ in single file, __DATA__ doesn't belong to package

Example:

use strict;
use warnings;

package Foo;

print while <DATA>;

__DATA__
hello
world

Output

Name "Foo::DATA" used only once: possible typo at example.pl line 7.
readline() on unopened filehandle DATA at example.pl line 7.

That example doesn't produce the output shown. I'm guessing the example was supposed to include a package block,

package Foo {
    print while <DATA>;
}

which does produce the output you show.

Proposal:

a) support __DATA__ inside package block (ie, it will be treated as } as well

Absolutely not. DATA is currently straightforward and predictable. Why on earth would you want to start including special-cased tricksy parsing? Then you end up with a whole can of worms. What about nested packages, possibly intermixed with other blocks?

b) treat insignificant content following } as content still belonging to the latest package block

Again, no, for the same reasons.

I can't see that this is a problem which needs fixing. The point of a block-scoped package declaration is that at some point in the file, you want to revert to the previous package. If you want the new package to be in scope to the end of the file, including the DATA token, then just leave the block off.

-- Wesley Crusher gets beaten up by his classmates for being a smarmy git, and consequently has a go at making some friends of his own age for a change. -- Things That Never Happen in "Star Trek" #18

iabyn avatar Sep 22 '24 12:09 iabyn

In addition, I'd like to note I feel __DATA__ is too abused and overused. I recommend using here-docs or https://metacpan.org/pod/File::ShareDir rather than the global-variable-like-*DATA

shlomif avatar Sep 22 '24 15:09 shlomif

@shlomif too doesn't mean that its usage isn't advocated as well

happy-barney avatar Sep 22 '24 15:09 happy-barney

In addition, I'd like to note I feel __DATA__ is too abused and overused. I recommend using here-docs or https://metacpan.org/pod/File::ShareDir rather than the global-variable-like-*DATA

I wholeheartedly disagree. I use __DATA__ and __END__ a lot and I have never in my life used File::ShareDir nor did I ever feel the need for that.

Main reason is that scripts with __END__/__DATA__ hold that data within and are easy to copy to other hosts whereas external files need to be copied seperately with all possible isseus.

Here-docs are file for a single-use short piece of data, but when dealing with a list or longer data structures or text pieces to apply tests to (like: does this parse), here-docs only blur the code

My € 2.00

Tux avatar Sep 23 '24 11:09 Tux

Everything here is working as expected, and I don't think there's anything worth changing. __DATA__ is inherently a file-based feature, and trying to make it interact with inner lexical scopes isn't going to work well. class isn't any different from package here, and can be written without an enclosing block.

If you need a block of data inside a scope, you can use a heredoc.

haarg avatar Oct 02 '24 14:10 haarg

@haarg I will repair your sentence: "Everything here is working as implemented", because this issue is about to change "as expected".

happy-barney avatar Oct 02 '24 14:10 happy-barney

No, I chose my words intentionally. Based on how these features work, doing something other that how they work right now would be very strange.

haarg avatar Oct 02 '24 17:10 haarg

Yeah I don’t see the issue here. The package is not in scope outside the block so the __DATA__ doesn’t apply to it. Finding a way to make __DATA__ somehow get assigned to a package other than the one that’s in scope would be awful.

ap avatar Apr 03 '25 16:04 ap

@ap issue is consistency of language. Currently you can have either package-block or __DATA__ but not both.

Look at that from point of view of newbie in language used to package-block syntax from other languages.

happy-barney avatar Apr 03 '25 16:04 happy-barney

There is no way to fix that (inasmuch as it even needs “fixing”). __END__ and __DATA__ by definition consumes the rest of the file and package NAME BLOCK by definition needs a block-end marker. There is no way to put these features together, except by force of screwing up the design of at least one of them with some sort of craziness. This is a problem only if you impose on yourself a rule to always use package NAME BLOCK.

ap avatar Apr 03 '25 16:04 ap

@ap as they say "never say never" - I put two proposal into issue

Possible implementation of proposals:

a) support DATA inside package block

  • allow indented __DATA
  • such data must match qr ( \} \s* $ )x, which will be removed by lexer and lexer (edit: finish sentence) will do yyl_data_handle and return PERLY_BRACE_CLOSE before yyl_fake_eof

current perly.y

PERLY_BRACE_OPEN remember stmtseq PERLY_BRACE_CLOSE

new perly.y

PERLY_BRACE_OPEN remember stmtseq optdatablock PERLY_BRACE_CLOSE

b) treat insignificant content following } as content still belonging to the latest package block

I don't have exact implementation but idea:

  • Perl_package currently sets PL_curstash
    • extend it with storing effective stash
  • at the end of block, effective stash will be preserved
  • yyl_data_handle currently uses current stash - will use effective stash instead
  • yylex, when returning token, will assign current stash to effective stash

happy-barney avatar Apr 03 '25 16:04 happy-barney

there is no alternative to attach DATA to package except of old package syntax

Indeed, here's how one can tell Perl which package the __DATA__ section belongs to, all the while using package blocks:

use strict;
use warnings;
 
package Foo {
    print while <DATA>;
}

package Bar {
    # There's only one DATA handle per file,
    # and it's in the other package,
    # as we'll soon discover...
}

package Foo; # everything that follows belongs to package Foo
__DATA__
hello
world

Output:

hello
world

Currently you can have either package-block or __DATA__ but not both.

I think the above showed an example having both a package-block and a __DATA__ section.

package BLOCK declares the package of the block that follows, a package line declares the package of the lines that follow.

__DATA__ is inherently a line-based structure ("from this line on..."), like the old package statement and several others (__END__, # line...). So it doesn't feel very inconsistent to me to have to use some line-based syntax to ensure which package applies to it.

Look at that from point of view of newbie in language used to package-block syntax from other languages.

I don't know of other languages that have an equivalent of the __DATA__ feature. (That is likely a sign of my own ignorance; I'd be happy to learn about them!)

book avatar Apr 07 '25 08:04 book

removed as still out of point of issue - issue is not about how it works now, but how it should work better

Look at that from point of view of newbie in language used to package-block syntax from other languages.

I don't know of other languages that have an equivalent of the __DATA__ feature. (That is likely a sign of my own ignorance; I'd be happy to learn about them!)

re other language with __DATA__ - for example Ruby has __END__, PHP has __halt_compiler

re newbie (explained):

  • Perl has strong concept 1 package 1 file (yes, you can mess ...)
  • pattern symbol-space BLOCK is more common across programming languages:
    • typescript: namespace Foo { }
    • java: class Foo { }
    • ruby: module Foo ... end
    • C++: namespace Foo { }; class Foo { }

happy-barney avatar Apr 07 '25 08:04 happy-barney

as they say "never say never" - I put two proposal into issue

“Never” still applies. You didn’t put the features together, you proposed two possible ways to screw up the design of either one of them with some sort of craziness
 just like I said.

The whole point of __DATA__ is that it gives you a file handle. If you make the feature do some kind of post-processing of the content so that the file handle no longer actually directly accesses the file on disk, then the feature is pointless and anything you would want to do with it can be done with just a heredoc. We have had indentable heredocs for a while. If that is what you are asking for then your answer is “use a heredoc for that”.

And “let’s break the basic concept of how scopes work just for this one obscure super-special case” is not a serious proposal. Your list of languages is nice and handy here, go ahead and show me a construct in any of them where a scope consumes things that come after the end of the scope.

You have a problem with the language because of a self-imposed stylistic preference. The solution is not to change the language but to fix your stylistic preference.

ap avatar Apr 07 '25 09:04 ap

re other language with __DATA__ - for example Ruby has __END__

Not a useful example. Ruby only has a single global DATA handle, nothing like Perl’s per-package handles. If you have __END__ in multiple files, only one of those will actually be accessible. It certainly doesn’t try to allow __END__ inside a scope.

PHP has __halt_compiler

This is even more primitive. It just causes a constant __COMPILER_HALT_OFFSET__ to be defined, and you get to open the file and seek to that offset yourself. (Though this design does make it even more explicit that the file handle is the entire point of the exercise.) I think this constant is per-file but I’m not sure. Anyway PHP doesn’t do anything special to address putting this directive inside a scope, either.

ap avatar Apr 07 '25 09:04 ap

re other language with __DATA__ - for example Ruby has __END__

Not a useful example. Ruby only has a single global DATA handle, nothing like Perl’s per-package handles.

doesn't what you say applies for combination of package-block and __DATA__ ? then it is always single global main::DATA ... and that's point raised by this issue.

happy-barney avatar Apr 07 '25 11:04 happy-barney

it is always single global main::DATA

There's as many DATA handles as there are packages, and files to hold them.

book avatar Apr 07 '25 11:04 book

it is always single global main::DATA

There's as many DATA handles as there are packages, and files to hold them.

I admit I'm not native speaker but as this pair of sentences really so badly formulated ? (if so, I'd ask someone to play devil's advocate and re-write my comments little bit better)

happy-barney avatar Apr 07 '25 11:04 happy-barney

doesn't what you say applies for combination of package-block and __DATA__ ? then it is always single global main::DATA ... and that's point raised by this issue.

Yes, if you write your multiple Perl code files in such a way that their __END__ markers are all in package main, then only one of them will become main::DATA.

So don’t do that.

ap avatar Apr 07 '25 14:04 ap

Since you can't stick a __DATA__ inside curlies, how about a namespaced __DATA__ tokens?

package Foo {
  sub lines { <DATA> }
}

Foo::__DATA__
first line, index 0
second line, index 1 

guest20 avatar Apr 07 '25 14:04 guest20

Since you can't stick a __DATA__ inside curlies, how about a namespaced __DATA__ tokens?

That is interesting idea as well.

there is also possibility of new handle, eq FILE_DATA or even better capablity to turn DATA into lexical (... my earlier poc will become handy, https://github.com/Perl/perl5/pull/22850)

happy-barney avatar Apr 07 '25 15:04 happy-barney

On Mon, Apr 7, 2025 at 10:57 AM guest20 @.***> wrote:

Since you can't stick a DATA inside curlies, how about a namespaced DATA tokens?

package Foo { sub lines { <DATA> } }

Foo::__DATA__first line, index 0second line, index 1

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/22613#issuecomment-2783634744, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMARDFI7CL5QZR4YW2JEXL2YKG6BAVCNFSM6AAAAABORSND42VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTGYZTINZUGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: guest20]guest20 left a comment (Perl/perl5#22613) https://github.com/Perl/perl5/issues/22613#issuecomment-2783634744

Since you can't stick a DATA inside curlies, how about a namespaced DATA tokens?

package Foo { sub lines { <DATA> } }

Foo::__DATA__first line, index 0second line, index 1

As noted earlier in the thread, this can already be done by prefixing the line with package Foo;

-Dan

Grinnz avatar Apr 07 '25 16:04 Grinnz

  • package NAMESPACE BLOCK syntax reverts to the previous package after the block ends. The BLOCK must be properly terminated.
  • __DATA__ stops parsing the file and leaves the file handle being read from open and stores it in *DATA in the currently active package. Syntactically, it is the equivalent to the end of the file.

Neither of these things are going to change, even if it means these two features don't usefully interact with each other.

Something involving a lexical file handle would have to work significantly differently from __DATA__. You are welcome to propose something like that if you have a design for it.

haarg avatar Apr 07 '25 19:04 haarg