strict names

Consistent identifier parsing rules

perl5 and cperl older than 5.27.0 accepts any string as valid identifier name when being created under no strict 'refs' at run-time, even when most such names are illegal, and cannot be handled by most external modules. Even invalid unicode is allowed.

cperl 5.26 fixed embedded NUL’s and invalid unicode identifiers illegal, and normalizes unicode identifiers in the parser.

Since cperl 5.27.1 dynamically created names are treated the same way as when they are parsed. Which means illegal utf8 names are rejected, unicode names are now normalized at run-time in the rv2sv OP, via ${"string"} and mixed unicode scripts are also checked.

strict names

strict ‘names’ is now implemented, included in the default and enabled with cperl 5.27.1. It checks for valid identifiers being created from strings under no strict 'refs' at run-time to match the same rules as when they would have been created at compile-time by the parser. Which helps in fighting invalid identifiers, which cannot be handled by the rest of perl. There was still room left to create invalid and potentially harmful utf8 or binary names at run-time via no strict 'refs'. strict names ensures no illegal name will get created.

Note that p5p insists that illegal identifiers are still legal to create at run-time. Only compile-time illegal identifiers are illegal.

Currently it clashes with a reserved VMS hint. That means on VMS strict names will be implemented in a slower way, via a hints hash key, not a hints scalar bit.

Examples

  • This was legal before and is now illegal:
    use strict; no strict 'refs';

    ${"\xc3\x28"}
    
    my $s = "\xe2\x28\xa1";
    ${$s}
    
    ${"$s\::xx"}
    
    ${"\cTAINT"}

=> Invalid identifier “\24AINT” while “strict names” in use

  • This symbol is since 5.26 normalized, previously not.
    use strict; no strict "refs";
    my $café = "café";   # <c, a, f, e, U+0301, U+0301>
    print $café;         # <c, a, f, U+00E9>

Before:

Empty

Now:

café
  • And the illegal UTF-8 variant:
     use strict; no strict 'refs';
     my $café = "café"; # <c, a, f, e, U+0301, U+0301> 
     print ${$café};    # <c, a, f, U+00E9>

Before:

Global symbol "$café" requires explicit package name (did you forget to declare "my $café"?) at -e line 3.

Now:

Malformed UTF-8 character: \x81 (unexpected continuation byte 0x81, with no preceding start byte) in scalar dereference at -e line 3.
Malformed UTF-8 character (fatal) at -e line 3.

CPAN Impact

Not many CPAN modules are affected by strict names being on by default. This is expected as strict names mostly protects against run-time security attacks.

cperl caught the wrong leading $ here.

  • Scalar-List-Utils: tests for binary names without no strict.

  • PathTools: File::Spec::Unix

    my $taint = do { no strict; ${"\cTAINT"} };

The default package %main:: is not detected yet with valid_ident(), so this fails under strict names, but would pass with ${"::\cTAINT"}.

  • EUMM: ExtUtils::MakeMaker::Locale
Encode::Alias::define_alias(sub {
    no strict; # no strict names: "-" is an invalid IDCont
    no warnings 'once';
    return ${"ENCODING_" . uc(shift)};
}, "locale");

$ENCODING_UTF-8 is an invalid identifier. So don’t use strict names.

Comments on /r/cperl