NAME

perltypes - Perl type support

DESCRIPTION

Dynamic types

Perl is a dynamic language with dynamic data types. The actual type is typically determined at run-time, context-dependent and usually the operator leads to automatic type conversions, cached in the data.

Static types

Perl always allowed optional declaration of static types, i.e. an existing package name, for lexical variables and with cperl also for function signatures. Lexicals are stored in pads, the package name, i.e. the type, is stored in the stash part of the comppad_name slot for each pad. Perl itself does not use these types for checks and optimizations, only external modules or cperl do.

Remember: A type is implemented as package, later also as class. A valid type must be an existing and loaded package. Type expressions, like parametric types or unions cannot be stored in pad stashes yet.

Global variables can not declare to be typed, only lexicals, signatures and function returns. Constants are implicitly typed.

cperl adds support for builtin coretypes, ffi types, has type declarations for most internal ops, and can optimize based in type inference or declared types.

Why types? Types are unperlish

No, they are not. Types for lexical variables are permitted since 2001, esp. with use fields and used in a few CPAN modules, i.e. Net::DNS, IPC::Run. Perl6 uses types all over. Every perl5 value is typed at run-time. Just all the normal ops are generic and are allowed to change the type of its arguments and its result at will.

Checking types already at compile-time allows dramatic performance and size optimizations, lead to better documented code and lead to earlier compile-time errors, which less need to test all possible run-time types with extensive test suites, which rarely cover all type cases.

Natively types arrays are 4x smaller and faster, typed loops can lead to static loop optimizations, array elements do not need to be checked for out-of-bounds at run-time. There's no need to check for tied methods or other magic for typed variables. Most of the run-time magic, i.e. checking for extraordinary conditions can be bypassed with typed variants. And with unboxed native types all the arithmetic ops are at least 2-4x faster.

Without types there will be no multi-dispatch, no FFI, no proper object system, and smartmatch is almost impossible without types. There's no need to overload internal methods anymore. The compiler and run-time can dispatch on the types of its arguments.

Without types no builtin FFI, foreign function interface. You still need to call an external FFI module or write a XS function, but loose all the benefits of builtin types.

But even without explicit type declarations the compiler internally can handle the arguments and result types much better as e.g. the javascript v8 engine does, but observing the run-time types and optimize dynamically. You don't need dart to run fast javascript but it helps. Every other dynamic languages switched in the meantime to optional typing. No other dynamic language forced type optimizations and checkers out of core, only p5p did.

Type tree

  Scalar -> Numeric -> Int|Num -> int|num
                       Uint       uint
         -> Str -> str

         -> Object -> ...

From most generic left to more specific to the left. Contravariance enables you to use a more generic (less derived, more left) type than originally specified.

coretypes

The builtin coretypes module implements efficient implementations and types of Int, UInt, Num and Str, which can be applied to scalars, arrays, hashes and functions, in lexical variable declarations and in signature declarations before the variable name, or afterwards as attribute.

Any type change for typed variables will be detected at compile-time and if this fails at run-time. The type Int is interpreted as IV, Num as NV and Str as PV only. There is no stringification for IV and NV, no magic such as "tie" in perlfunc, no setting to undef allowed.

    my Int @a = (0..9);
    tie @a, 'Tie::Array';
    => compile-time error: Invalid tie for typed array

We provide fast ops variants for these types to omit type checks and magic calls at run-time.

Operations on typed Int will not promote to double on overflow, such as under use integer, but arithmetic on untyped IVs will promote to doubles.

The @ISA of the coretypes are guaranteed to be empty. Thus there will no parent of all, such as a generic type Object, Scalar or Dynamic to override coretypes, but there will be subtyped children of coretypes to allow stringification and undef. So it is safe to optimize coretypes to its native operations at compile-time.

coretypes are less strict than user-types.

Implicitly typed coretypes such as constant literals are treated not as strict, and are type-casted as before. With use warnings "types" however some warnings will be shown.

The type checkers will not throw errors on coretype violations with existing ops, only use types or use warnings "types" warnings, which can be made fatal. The new signature op and feature however is stricter, and does throw type violation errors at compile-time on not matching coretypes. Other basic non-coretypes, i.e. user-types and Scalar, Numeric will always throw type violation errors. Any type can be promoted to int, num and str. Most constants, such as constant literal strings, numbers, integers are typed as coretypes.

See "Type checker" below.

See also "coretypes" in perldata for examples.

native types

Types for perl objects should have classnames names starting as uppercase, with several reserved names for core types, see above.

The lower case variants int, uint, num and str are used to handle native unboxed values directly, not refcounted, and are permitted for rare cases: In special sequences of ops which do understand them. They are also mandatory for the ffi ("foreign function interface") in core.

The compiler handles boxing and unboxing automatically for the parts where unboxed values are not permitted on the stack. Thus you are allowed to use native types instead of coretypes overall, and the compiler uses the boxed variants instead as it sees fit.

Note that you are safe to declare native types to all your lexicals, even you want to declare them only as Int, Num or Str, i.e. boxed. A type int is a hint to declare the SV a possible native int type, but initially every int is treated as Int type, a normal IV.

This reduces memory four to ten times per scalar, and speeds up combinations of pure arithmetic code and natively typed arrays.

Operations on native type int will not promote to double on overflow, arithmetic on untyped IV will promote to doubles.

class :native declarations uses mandatory types on all fields, and guarantees C-like packing of the fields when being used as arguments or return types for extern sub calls. The fields resemble C structs which can be passed to extern subs, and is similar to the Perl6 is repr('CStruct') feature, but has less restrictions.

Internally:

The native op variants start with int_, uint_, num_, str_. There also exist some more ffi specific native types. Any sequence of natively typed ops might need to start with an unbox_ op to convert the vaues on the stack from boxed to unboxed, and end with either a box_ op or the OPpBOXRET private flag in the op. This unboxing and boxing adds some runtime costs, so the compiler is free to omit such type promotions at it sees fit. However with the new FFI the native type declaration is guaranteed to be observed by the compiler for every ffi call, and the unbox and box ops are added automatically.

Unboxed native values can appear on the stack, on pads and in const ops.

Type checker

The type checker is totally unsound. Only a few ops are type-checked, currently only signature subroutine calls and scalar assignments.

Compile-time type checks for user-defined types need to be enabled with use types. Type warnings are enabled with use warnings 'types' and appear at compile-time. Note that currently use types just enables the types warnings category.

Type warnings can of course be fatalized, e.g. with use types 'strict' Signature type checks do not need use types and are always fatal, at compile-time.

    my int @a;
    my MyClass $c = bless [], 'MyClass';
    $a[0] = $c;
    # compile-time types warning: Inserting type cast int to MyClass

    my MyClass $a;
    $a[0] = 0;
    # no warning: objects are accepted and converted to coretypes.

    use warnings;
    my int @a;
    $a[0] = "";
    # compile-time warning: Type of assignment to @a must be int (not Str)

    use warnings FATAL => 'types';
    my int @a;
    $a[0] = "";
    # fatal compile-time warning: Type of scalar assignment to @a must be int (not Str)

use base and use fields was changed in cperl to behave as proper cperl classes. I.e. their @ISA are compile-time closed, which enables proper inheritance checks at compile-time.

use types 'strict'

The strict argument to the types pragma enables strict types warnings, so old-style code behaves like modern perl, dying on type violations. use types without strict just warns.

With strict types even user-type violations will error, without only signature type violations are strict.

Type inference

The inferencer runs automatically on some very limited syntax and can currently only infer Int on array indices, ranges and Str on hash keys, but has to give up on magic, dualvars, and no strict 'refs'. But the current type inference is very fast.

In future versions with the help of added declarations and type checks, as e.g. in if, smartmatch or given/when with type support it will be able to infer much more.

    if (type $a == :int) {  => $a is an int in this scope }

Typed lexicals and signatures lead to a typical performance win to ~2x faster, you get compile-time type warnings, a business friendly coding environment and the possibility to display and put infered types automatically in your code, with a cooperating editor. e.g.

    my $n = 1000;
    for (my $i=0; $i<$n; $i++) { }
=>
    my int $n :const = 1000;
    for (my int $i=0; $i<$n; $i++) { }

Note: When in doubt leave out types. If the inferer cannot find it, it might not be worth the trouble. But for hot code always use types, as compile-time types prevent from costly run-time checks for types and magic hooks.

FUNCTIONS

typedef (NY)

    typedef newtype type-expr;

typedef stores a type expressions, such as a union of types ( int | uint) or type restrictions as in perl6 (int where int>0) as a new type name.

This is similar to the subset operator in perl6, but perl6 already stores types as objects, while cperl still has to store types as classnames.

typeof (NY)

    typeof expr

typeof returns the compile-time declared or inferred type of the expression. This may be different from the run-time class name, obtained with "ref" in perlfunc or "reftype" in Scalar::Util.

Internals::HvCLASS

Get or set the status of HvCLASS of stashes.

Needed by the cperl variants of base and fields to make the @ISA inheritance readonly at compile-time, to enable compile-time checks of classes using base or fields. Those classes are essentially the same as the upcoming class keyword, which is fast sugar over the old base/field object system with pseudohashes.

Without const @ISA we cannot do compile-time type checks, and we better don't do them at run-time. Only dynamic non HvCLASS packages are type-checked at run-time.

More type terminology

nominal type system

cperl implements a simple nominal type as in perl6 and most dynamic languages, in contrast to a structural type system such as in static languages as C or C++. The name of the class or type and its subtypes specify correctness, not the list of object fields and methods.

The author has to check proper subtyping rules manually, which are not enforced by type syntax.

subtyping vs subclassing

Linear inheritance and even more multiple inheritance as implemented with perl5 does not necessarily guarantee proper subtyping. But cperl assumes proper subtyping of all subclasses in the type checker.

This is done with the following restrictions to satisfy the "Liskov substitution principle" (LSP):

Any subclass, a class which is derived from one or more parent classes and is used as type parameter, must guarantee the following restrictions, a cperl design by contract.

For method signatures:

Contravariance of method arguments in the subtype

Typed arguments accept the same type or more generic arguments. Int as accepted as Numeric, but Numeric is not accepted as Int.

Contravariance is the most used variance: Enables you to use a more generic (less derived) type than originally specified. It is used for all signature and assignment type matching.

Covariance of return types in the subtype.

Typed return values accept the same type or more specific types. Numeric is accepted as Int, but Int is not accepted as Numeric. This is only used for declared function return types. Currently not implemented.

No new exceptions should be thrown by methods of the subtype

Except where those exceptions are themselves subtypes of exceptions thrown by the methods of the supertype. E.g. it is required to throw a read-only exception of the subtype restricted an attribute to be readonly.

For the subclass/subtype definition:

Preconditions cannot be strengthened in a subtype.

Postconditions cannot be weakened in a subtype.

Invariants of the supertype must be preserved in a subtype.

History constraint (the "history rule").

Objects are regarded as being modifiable only through their methods (encapsulation). Since subtypes may introduce methods that are not present in the supertype, the introduction of these methods may allow state changes in the subtype that are not permissible in the supertype. The history constraint prohibits this.

A violation of this constraint can be exemplified by defining a mutable point as a subtype of an immutable point. This is a violation of the history constraint, because in the history of the immutable point, the state is always the same after creation, so it cannot include the history of a mutable point in general. Fields added to the subtype may however be safely modified because they are not observable through the supertype methods. Thus, one can derive a circle with fixed center but mutable radius from immutable point without violating LSP.

For example when you change the type of an attribute in the subclass from writable to read-only you need to change the setter method in the subclass to throw an error, and you are disallowed to use the supertype as valid type in an argument declaration.

Avoiding the diamond problem:

Subtypes need proper linearization of the inheritance tree.

Therefore cperl switched from the old default mro dfs to the better but stricter c3 inheritance with v5.26.c. @ISA violations throw errors with c3. Many old perl5 packages have broken ISA trees, and either need to use mro 'dfs', or fix it.

A typical violation:

A typical example that violates the Liskov substitution principle is a Square class that derives from a Rectangle class, assuming getter and setter methods exist for both width and height. The Square class always assumes that the width is equal with the height. If a Square object is used in a context where a Rectangle is expected, unexpected behavior may occur because the dimensions of a Square cannot (or rather should not) be modified independently. This problem cannot be easily fixed: if we can modify the setter methods in the Square class so that they preserve the Square invariant (i.e., keep the dimensions equal), then these methods will weaken (violate) the postconditions for the Rectangle setters, which state that dimensions can be modified independently. Violations of LSP, like this one, may or may not be a problem in practice, depending on the postconditions or invariants that are actually expected by the code that uses classes violating LSP. Mutability is a key issue here. If Square and Rectangle had only getter methods (i.e., they were immutable objects), then no violation of LSP could occur.

For more see https://en.wikipedia.org/wiki/Liskov_substitution_principle and https://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science).

Perl6 subset:

In Perl6 you don't have these type checking guarantees, and therefore no proper type optimizations and no type checking soundness. Using subtypes only makes run-time slower, while only in very few cases helps improving the compilation.

In Perl6 a subtype is not a subclass. Subclasses add capabilities, whereas a subtype adds constraints (takes away capabilities). A perl6 subtype is primarily a handy way of sneaking smartmatching into multiple dispatch. Just as a role allows you to specify something more general than a class, a subtype allows you to specify something more specific than a class. A perl6 subtype specifies a subset of the values that the original type specified, which is why perl6 uses the subset keyword for it.

While subsets are primarily intended for restricting parameter types for multiple dispatch, they also let you impose preconditions on assignment. If you declare any container with a subset, Perl will check the constraint against any value you might try to bind or assign to the container.

Perl6-like subsets require type-objects instead of type-classes. Perl6 subclasses don't guarantee subtyping.

See https://design.perl6.org/S12.html#Types_and_Subtypes

Compile-time type optimizations

Since Perl 5 core does not deal with types stored in comppad_name per se, type checks and optimizations were usually deferred to the modules which implement respective types checks and optimizations, and all those modules were broken with 5.10.

The only type optimization currently in effect in Perl 5 is constant folding and use integer.

cperl has type declarations for most internal ops, and can optimize these ops depending on the argument types. opnames.h stores PL_op_type_variants, all possible type promotions and demotions for each op. opcode.h stores PL_op_type with the type declarations of all ops.

cperl is able the change of compile-time static method calls, determined either by name or by type and const-ness of all searched packages to static function calls. Thus the dynamic method search, in which object this method is implemented is avoided. This is about ~10% faster.

Constant folding

Right-hand-side expressions or :const function bodies or function bodies with an empty prototype () which resolve at compile-time to constant literals may be optimized to a CONST value, and left-hand-side numeric ops may be optimized to use their optimized i_ or even int_ counterparts. Note that i_ ops do not overflow, the integer values just wrap around. So the type and data range must be determined in advance and if not possible i_opt promotion it is forbidden.

    my $c = $a + (1 << 8);
    => my $c = $a + 256;  # add $a CONST(IV 256)

    use coretypes;
    my int $a;
    my $c = $a + (1 << 8);
    => my $c = $a + 256;  # i_add $a CONST(IV 256)

    { use integer;
      my $a = 1;
      my $c = $a + (1 << 8);
    }
    => my $c = $a + 256;   # padsv($a) CONST(IV 1); i_add $a CONST(IV 256)

    { use integer;
      my $c = 1 + (1 << 8);
    }
    => my $c = 257;       # CONST(IV 257)

    my $a :const = 1;
    my $c = $a + (1 << 8);
    => my $c = 257;       # CONST(IV 257)

Unlike perl5, cperl does constant folding of function bodies even without an empty prototype.

    sub PI { 3.1415 }

which is the same as the old syntax sub PI () { 3.1415 }

:const packages

    package MyBase 0.01 :const {
      our @ISA = ();
      sub new { bless { @_ }, shift }
    }
    package MyChild 0.01 :const {
      our @ISA :const = ('MyBase');
    }

    my $obj = MyChild->new;
    => MyClass::new()

When the method search goes only through const packages and their const @ISA, it is not possible to inject another package at run-time into the method search, thus the method call can be short-cut. This classes can be finialized, and all those method calls can be resolved at compile-time to static function calls, and can be inlined, and therefore even more optimized.

Note that the package MyBase must be constant here. Otherwise &MyBase::new can be deleted and @MyBase::ISA be changed to lead a parent object at run-time.

base classes with optional fields just close the @ISA, but not the method hash.

Types and const inheritance

    package MyBase 0.01 {
      our @ISA = ();
      sub new { bless { @_ }, shift }
    }
    package MyChild 0.01 {
      our @ISA = ('MyBase');
    }

    # closed call.
    my MyChild $obj = MyChild->new;
    => MyBase::new()

When the left-hand side of a method call is typed, the result of the method call must be of this type or any dependent type. $obj is already declared of type MyChild, thus it cannot be of any other run-time injected package.

    package MyBase 0.01 {
      our @ISA = ();
      sub new { bless { @_ }, shift }
    }
    package MyChild 0.01 :const {
      our @ISA :const = ('MyBase');
    }

    # open call. MyChild is of type MyBase
    my MyBase $obj = MyChild->new;
    => MyBase::new()

When the left-hand side of a method call is typed, the result of the method call must be of this type or any dependent type (i.e., MyBase or MyChild). Since MyChild is constant, i.e. no &MyChild::new method can be added at run-time, and @MyChild::ISA is also constant, it can only be &MyBase::new, even if MyBase itself is not constant.

Lexical subs NYI - move to perlsub.pod

Lexically defined subs in classes or package blocks are private methods, invisible and unchangable.

    package MyClass 0.01 {
      our @ISA = ();
      my sub _new { bless { @_ }, shift }
      sub new (...) { $_[0]->_new(...) }
      my $private; # pad in maincv
      our $open;   # in stash
    }

    ...
    package main;
    my $obj = new MyClass; # i.e MyClass->new is valid and optimized.
                           # Indirect method call syntax helps.

    my $obj = MyClass::_new(); # invalid
    => Undefined subroutine &MyClass::_new called

Since &MyClass::_new is lexically defined in the package scope, the compiler may statically optimize all method calls to &MyClass::_new to an improved entersub (the CV being on a pad) without namespace lookup and dynamic method resolution, without having to const %MyClass:: and const @MyClass::ISA.

Lexical definition guarantees compile-time definition, which is not overridable dynamically at run-time.

External type modules

External modules, such as types, typesafety or Moose, implement their type checks or optimizations for the types they declare or allow declaration for during execution of "CHECK" in perlmod blocks. They are very limited and slow in what they do.

XXX PACKAGE types questionable

As convenience for module authors it was asked to allow declarations, like

    package MyClass;
    my __PACKAGE__ $obj = __PACKAGE__->new;

Currently only the right-hand side is valid Perl.

This looks awful though. Refactoring of the package name should really refactor the internal types also, besides the type of all library users. But it would be consistent. See http://www.perl.com/pub/2000/06/p5pdigest/THISWEEK-20000625.html#my___PACKAGE___obj_