Design‎ > ‎

Basic / value types

Main choices:
- Most things can be done with int and float.  Others are supported for
  reducing data space, performance and/or convenience.
- All the basic types are predefined.  No import needed.
- Sufficient diversity to avoid using a type that doesn't fit the need.
- Not too much diversity to avoid problems with understanding a program.
- Strict specification about precision to make programs portable.
- Basic types are passed by value for efficiency.  They are not constructed,
  unlike objects.

The most common type is an integer number.  Experience from other languages
show that 32 bits is not always sufficient.  Most operating systems now
support files larger than 2 Gbytes, this size cannot be kept in a 32 bit
signed integer.  Therefore using 64 bits in general avoids mistakes of trying
to put a large number in 32 bits.

Unsigned numbers are sometimes used when a negative value is not possible.
However, mistakes are often made when subtracting two unsigned values.
Example:  "if (one - two < 0)".  This can be avoided with "if (one < two)",
but requires the programmer to know the type of the variable.  Therefore
unsigned values are generally to be avoided.  This matters especially for
getting the size of an item, e.g. the length of a string.

In rare cases where efficiency is more important and large numbers are not
expected (e.g., on mobile devices), Int can be redefined to be 32 bits.

The list of supported types is given in the language specification page
about basic types
.

Complex type

Should we support a complex type in the language?  It's use is rather
limited, I can't remember writing a program that used them.  If we don't
support them they can easily be represented with an array of two Floats.
So let's not make the language and libraries complex.


Time type

Traditionally Unix uses a time_t, which counts seconds.  However, very often a
second is not accurate enough.  A composite type can be used, but that makes
computations more complicated, one needs to call a function instead of a
simple subtraction.

An alternative would be to use a float.  However, this requires many conversions,
since most system functions use an integer.  Also, round-off errors are likely
to occur and result in printing 9.999999 seconds instead of 10.

When using 64 bits for time, the accuracy is a trade-off with the range:

 Accuracy Range
 1 second 292 billion years
 1 millisecond 292 million years
 1 microsecond 292 thousand years
 1 nanosecond 292 years

Let's assume we want to be able to represent years in history, including the
roman empire.  292 years is not sufficient then.  A microsecond is not
accurate enough to measure the time of an instruction in modern processors,
but we hardly ever need this.  292 thousand years is not sufficient to
represent astronomical dates.  Thus this appears to be a nice compromise for
daily use.

Define a special type for this microsecond time?  There is no good reason for
this, let's just use Int.

For more accurate time with a large range use a composite type:
        CLASS Time
           int $sec
           int $fsec  # femto seconds: 10E-15 seconds
        }

Note that most system timers are only accurate up to a microsecond (10E-6)
or a nanosecond (10E-9).


ENUMs

Enums give a symbolic name to each possible value.
This works like a typedef at the same time, the name
can only be used in the context of the type.

        ENUM Ecolor
          none
          red
          blue
          green
        }
        Ecolor mycolor = Ecolor.red

Unitialized variables use zero, which equals to the first entry.  The value of
other names is 1, 2, etc. as if it was a list.  But this can't be used
directly.  It is possible to get the name of the value with name():

         IO.writeLine("color: " + mycolor.name())

Looping over the values:
         FOR v IN Ecolor
           IO.writeLine("color: " + v.name())
         }

SIZE() returns the number of values.
An ENUM can be used in a SWITCH.
The order of values can be used, e.g. for sorting.


BITS

BITS is an integer where each bit or group of bits is given a specific
meaning.  It's an efficient way to store several flags and small values in one
value.

One might argue that BITS is actually a composite type, since it stores
multiple items.  However, since it is passed by value it behaves much
like an Int.  That a BIT does not need to be constructed is what makes
it different from usual composite types.  This also makes it different from
the bitfield type used in C and C++.

BITS can replace a number of function arguments and make the function
call more readable.  For example, a function with four arguments:

        MyFunc("hello", TRUE, FALSE, 2)

Turns into one with only two arguments:

        MyFunc("hello", echo + repeat=2)

Now you can see what the flags and value "2" mean.  Without making the text
longer.  It's also much more efficient to pass 1 argument instead of 3.

All fields can be cleared by assigning zero to a variable of BITS type.
BITS are passed by value, like an Int.

        BITS WriteFlags
          bool  $echo       # uses 1 bit
          bool  $flush      # uses 1 bit
          Where $where      # three values fit in 2 bits
          nat5  $repeat     # 0 - 31 in 5 bits
        }
        ENUM Where
          atstart
          halfway
          atend
        }

        PROC write(string msg, WriteFlags flags)
          IF flags.echo
            FOR i IN 1 .. flags.repeat
              IO.write(msg)
            }
          }
          SWITCH flags.where
            CASE Where.atstart
                 ...
          }
          IF flags.flush
            ...
          }
        }
        ...
        write("Hello", echo + where=atstart)

For numbers one can use "Int1", "Int2", ... "Int32", "Nat1",
"Nat2", ...  "Nat32".  Outside of BITS only multiple of 8 bits
are available.

One can also add a method, like in a class:

        BITS WriteFlags
          ... as above

          FUNC $moreThanOnce(): bool
            RETURN $repeat > 1
          }
        }

When the compiler knows that the destination is a specific BITS its value can
be computed easily:

        WriteFlags wf = echo + where=atstart + repeat=2
        write("Hello", &wf)  # pass by reference to allow changes


Alternative syntax:
   WriteFlags wf = NEW().echo(TRUE).where(atstart).repeat(2))

        - NEW() suggests a reference is returned, not a value
        - doesn't look like the rest of Zimbu

   WriteFlags wf = NEW(echo + where:atstart + repeat:2)

        - NEW() suggests a reference is returned, not a value
        - detecting the meaning of the ':' is problematic, it is also used for
           "cond ? expr : expr" and in a Dict initializer.

   WriteFlags wf = NEW(TRUE, FALSE, atstart, 2)

        - NEW() suggests a reference is returned, not a value
        - not clear what flag belongs to what attribute
        + not a new mechanism

   WriteFlags wf = echo + where:atstart + repeat:2

        + clear that it's not an object
        - problem with detecting the meaning of ':', as above

   WriteFlags wf
   wf.echo = TRUE
   wf.where = Where.atstart
   wf.repeat = 2
       
        + standard syntax
        - verbose, can't be used in a function call

   WriteFlags wf = echo=TRUE + where=Where.atstart + repeat=2

        + avoids confusion and mistakes
        - booleans are used a lot, using TRUE is not nice

   WriteFlags wf = WriteFlags.echo + WriteFlags.where=Where.atstart + WriteFlags.repeat=2

        + avoids shadowing variables
        - very verbose

Choice:
   WriteFlags wf = echo + where=atstart + repeat=2
   write("Hello", echo + where=atstart)

Comments