Goals

  • Discover two ways to compile OCaml programs: bytecode and native.
  • Play with exceptions.
  • Learn to use side-effects wisely.

Requirements

  • You have completed OCaml - Lesson 5.
  • You have already compiled a program (C, Java, Ada, whatever) using a command-line compiler (gcc, javac, gnat).
  • You already know about exceptions in other languages (e.g. Ada, Java).

Mission 6

Compilation, exceptions, side effects

Compilation

  • You have to write a program that will be compiled. You can use any text editor. The file must have the extension .ml.
    • You can use ocaml-top, which can save source files.
    • gedit seems to know about ocaml keywords.
    • emacs and vim have ocaml modes. By the way, if you are a vim addict, you can consider using spacemacs instead.
      To enable OCaml-mode in emacs, add this to your .emacs (check that the paths are correct):
      (add-to-list 'load-path "/mnt/commetud/GEI/OCaml/.opam/4.08.2/share/emacs/site-lisp/")
      (load "tuareg-site-file")
      (load-file "/mnt/commetud/GEI/OCaml/.opam/4.08.2/share/emacs/site-lisp/ocp-indent.el")
      That should do it (check the ocaml path, though).
  • Write a function showdir of type unit → unit that does the following:
    • It reads the content of the current directory, using Sys.getcwd and Sys.readdir — both functions belong to module Sys.
    • Sort the entries using Array.sort whose type can be found in the Array module.
      You have at least three ways to find the type of a standard function:
      • Read the documentation. The Array module can be found in the Standard library in the Manual.
      • Launch utop and write Array.sort ;; which gives immediately its type.
      • Use man Array in a terminal.
      You see that sort expects a comparison function. You can use compare which is already defined. (What is its type?)
    • Print the name of all entries, using Printf and an iterator, to be found in the Array module.
  • Assuming the file is named foo.ml compile it with (in a terminal):
    • ocamlc -o foo1 foo.ml to build a bytecode executable file foo1.
    • ocamlopt -o foo2 foo.ml to build a native executable foo2.
    • Check the nature of both files: file foo1 and file foo2.
    • You can run directly these files: ./foo1 and ./foo2 (however, nothing should happen).
  • When run, the program evaluates all definitions.
    Your program only defines a function, so nothing happens when you run it.
  • To see something you only need to apply the function:
    let () = showdir ()
    It is not necessary to define a main. However, if you miss the good old main, you can write it like this:
    (* Define a function main. *) let main () = (* Put your stuff here... *) () (* Invoke main. *) let () = main ()
  • Recompile and try your program. It should print file names.
  • Note: bytecode programs and native programs have the same behaviour, except for a few points explained in the ocamlopt manual (see Compatibility with bytecode). Stack overflows behave differently.

Exceptions

  • Note: use the interpreter.
  • Exceptions are standard values of type exn. A few exceptions are predefined:
    let a = Not_found let b = Failure "arrh" let c = End_of_file let d = Division_by_zero Note that evaluating these values does not raise an exception.
    See predefined exceptions in the core library.
  • exn is a variant type. It is defined as:
    type exn = Not_found | Failure of string | End_of_file | Division_by_zero | (* and so on ... *)
    This is why a value such as [ Not_found ; Not_found ; Failure "boo!" ] is perfectly standard. It is a list of exn, and no exception is raised to build it.
  • Raise an exception: use raise Not_found (for example)
    • See the difference between let a = Not_found let b = raise Not_found
    • Better, see the type difference between:let a () = Not_found let b () = raise Not_found
    • Raising an exception has the polymorphic type ∀α. α. Indeed, it is a computation that never returns a value (it interrupts the flow of control), so it is sound to consider it has all the possible types.
    • You can now guess the type of raise (which is just a function).
    • Another predefined function: failwith , which is defined as: let failwith msg = raise (Failure msg)
  • exn is an extensible variant type. New values can be added after it is defined.
    To define a new exception: exception Horrible_error (with a capital, like other constructors).
    An exception can have an attribute (like Failure of string): exception Bad_bad_thing of int * string
  • Example:
    (* Function that returns (x,y) if the list given in argument starts with x and y, or fails with an exception if the list does not have two elements. *) let take_two = function | [] | [_] -> raise Not_found | x :: y :: _ -> (x,y)
  • Catch exceptions: use try...with
    We define a function similar to take_two except that it returns Some (x,y) or None when the list does not have two elements.
    let take_two_opt l = try Some (take_two l) with Not_found -> None
    • The syntax is: try expression with patterns , where patterns is a pattern-matching on the exn variant.
    • The pattern does not have to be exhaustive. Exceptions not in the patterns are simply propagated (they are not caught in this try block).
      Guess the type of:
      let test_raise x = try if x < 0 then raise Not_found else if x = 0 then failwith "Zero" else if x > 100 then raise Horrible_error else if x > 10 then raise (Bad_bad_thing (x, "Too big.")) else [ string_of_int x ] with | Not_found -> [] | Failure s -> [s]
      Notice that the expressions in the with part must have the same type than the try part.
    • Apply test_raise several times in order to see the five possible behaviours.
    • This example does not catch all possible exceptions. It is actually equivalent to
      ... with | Not_found -> [] | Failure s -> [s] | e -> raise e
StarExercise : Exceptions
  • Write a function call with type ∀αβ. (α → β) → α → β such that call f arg returns the result of f applied to arg and prints any exception that was raised by f. The exception must be re-raised again, once printed.
    To print an exception, you need the function Printexc.to_string see the module Printexc in the standard library. Note that call already exists in module Printexc, it is named print.
  • Test it. Use test_raise
  • Define a variant type: type 'a result = Ok of 'a | Error of exn
  • Write a function eval with type ∀αβ. (α → β) → α → β result such that eval f arg returns Ok x if f arg returns x, or Error ex if the exception ex was raised.
  • Write a function check_all with type ∀αβ. (α → β) → (α * β result) list → bool such that check_all f test_list returns true iff f returns the expected result for all the tests of the list.
    check_all take_two [] true
    check_all take_two [ ([1], Error Not_found) ] true
    check_all take_two [ ([1], Ok (1,1)) ] false
    check_all take_two [ ([1], Error Not_found) ; ([4;3;2;1], Ok (4,3)) ] true
    check_all take_two [ ([1], Error Not_found) ; ([4;3;2;1], Error Not_found) ] false

    (This is how your functions will be evaluated during the exam.)

Controled effects

We will study a couple of examples where side effects are carefully used.

  • First of all, define this function which is supposed to represent some heavy computation:
    let calc x = Printf.printf "Computing x*x with x = %d\n%!" x ; x * x
  • Notice that each time calc is invoked, it prints a message. (Try it).
StarStarExercise : Caching (memoization)

We write a cache for a function (this is also called a memoized function). The idea is to store previous results of a function to avoid computing them again the next times the function is invoked with the same arguments.

  • As an example, if we apply the non-cached version to a list List.map calc [ 3 ; 2 ; 2 ; 3 ; 1 ; 2 ; 3 ; 2 ; 2 ; 1 ] it prints ten times the message Computing x*x ...
  • However, with the cached version List.map cached_version_of_calc [ 3 ; 2 ; 2 ; 3 ; 1 ; 2 ; 3 ; 2 ; 2 ; 1 ] the message should be printed only three times because the results for the arguments 1, 2, 3 are remembered.
  • Of course, we write a generic cache (that is, polymorphic and higher-order). Enjoy!
    Write a function cache with type ∀αβ. (α → β) → α → β such that cache f is the memoized version of f. Here is some help:
  • The function cache expects two arguments: f (of type α → β) and arg (of type α).
  • For the cache structure, we use a hashtable. There will be one hashtable per function, thus the hashtable must be built after the argument f is received, but before the argument arg is received.
    To create a hashtable: let memory = Hashtbl.create 20 in ... (20 is the initial size, which does not really matter).
  • You need three functions of the Hashtbl module: mem, find and add.
  • Here is the test:
    let test_cache () = (* The "computing..." message is printed ten times. *) Printf.printf "\n I define l1:\n%!" ; let l1 = List.map calc [ 3 ; 2 ; 2 ; 3 ; 1 ; 2 ; 3 ; 2 ; 2 ; 1 ] in (* The "computing..." message should be printed only three times. *) Printf.printf "\n I define l2:\n%!" ; let l2 = List.map (cache calc) [ 3 ; 2 ; 2 ; 3 ; 1 ; 2 ; 3 ; 2 ; 2 ; 1 ] in (* Check that l1 equals l2 *) Printf.printf "l1 = l2 ? %b\n%!" (l1 = l2) ; ()
StarExercise : Laziness

Memoization can be applied to a function in order to remember previous results. Laziness is different: it applies to a value in order to delay the computation of the value until it is necessary. If a lazy value is not used, it is never computed.

  • First, note that lazy is a keyword in OCaml, so you cannot use it as an identifier. (We will use flazy, tlazy instead).
  • We need to define a type α tlazy to represent lazy values.
    Define it as a record containing the following: (it is a parameterized type, as seen in Lesson 4).
    • A (mutable) value of type α option.
      When this field is None, it means that the value has not been computed yet.
      When it is Some x, it means that the value is already computed and is x.
    • A function compute of type unit → α that will be used, once, to compute the value if needed.
  • Construction: write a function flazy with type (unit → α) → α tlazy which builds a lazy value. Of course, the value must not be computed at construction time.
  • Deconstruction: write a function get_value with type α tlazy → α which computes the value if not already done.
  • To ease the test, define let arglazy f arg = flazy (fun () -> f arg)
  • Define a function intshow with type int tlazy → unit which prints the int value of a tlazy (using get_value).
  • We use the function calc, already defined above.
    let laz1 = arglazy calc 1 and laz2 = arglazy calc 2 nothing is printed by calc.
    let _ = intshow laz1 ;; let _ = intshow laz1 ;; let _ = intshow laz1 ;; only the first invocation prints a message.
    let _ = intshow laz2 ;; let _ = intshow laz2 ;; let _ = intshow laz2 ;; ditto.

  • Build (using a recursive function) a list of pairs of type (int × int tlazy) list. The list should contain a hundred elements of the form (x, laz_x), where laz_x is the lazy computation of calc x and x ranges from 1 to 100.
  • Write a function choose of type int → (int × int tlazy) list → int such that choose n biglist returns the value associated to n in biglist (it may raise an exception if n is not in biglist).
    • The message printed by calc should appear only once, only for the requested value.
    • When invoked on the same element twice, the message should not appear again.
  • Finally, note that lazy values are already implemented in OCaml: see the Lazy module in the standard library.

Value restriction

  • Remember the cache example, above. The function cache f is expected to have the same type than f. Take for example cache calc.
  • Consider now this example. Guess the type of cached_single
    let single x = [x] let cached_single = cache single
  • The type of single is truly polymorphic: ∀α. α → α list in which α is a bound type variable: it is quantified by ∀.
    On the contrary, the type of cached_single mentions monomorphic type variables '_a like this: _α → _α list, where _α is a free variable.
  • Apply once cached_single: let res1 = cached_single "foo" ;;
    Now, check again the type of cached_single: cached_single ;; it is now instanciated to a totally monomorphic type string → string list.
  • You see that the OCaml type system considers that single can be applied to different types (it is polymorphic), whereas cached_single can only be applied to values having exactly the same type.
    You have to find out why it is correct to do so (more precisely, why it would not be correct to allow cached_single to be polymorphic).
  • In order to decide if a value is polymorphic or monomorphic, the type system uses a basic rule:
Value restriction (typing rule)
  • Values obtained by an immediate application are not polymorphic: their type variables are kept free (_α).
  • Values defined by anything else than an immediate application (e.g. a function defined by fun) can be polymorphic: their free type variables are quantified (α).
Actually, the OCaml type system applies a more subtle rule, called relaxed value restriction which allows to generalize all covariant type variables, which we will not explain here.
  • We consider a short example that shows again why value restriction is necessary.
    StarExercise : Value restriction
    Define a function forbid with type α → α → α such that forbid x returns a function which behaves like the identity, except for the argument x for which it raises an exception.
    let f = forbid 99 (creates a new function f)
    f 10 10
    f 20 20
    f 99 raises some exception

    Find a way to illustrate the value restriction using forbid.
  • Note that value restriction is needed only because of side-effects (because of mutable types). In a pure functional language, the value restriction is unnecessary.
  • We now consider a few cases where the value restriction is applied, but is not necessary:
    let mk_pair a b = (a,b) let mk0 = mk_pair 0 (* Check the type of f. Would it be safe to be polymorphic? *) let equals a b = (a = b) let is_empty = equals [] (* Same question *)
  • The compiler cannot guess if the value restriction is necessary or not. To be safe, it restricts all applications.
    Several research works have considered extensions of the type system in order to apply the value restriction only if necessary. These systems are quite complex, for a small benefit. Thus, because of its simplicity, the (abusive) value restriction is used in OCaml.
  • Fortunately, it is usually easy to get round the value restriction. In the above cases, it suffices to write:
    let fix_mk0 x = mk_pair 0 x let fix_is_empty x = equals [] x
    Here, fix_mk0 and fix_is_empty are (syntactic) function definitions, so the value restriction does not apply. On the contrary, the above definitions of mk0 and is_empty were applications, hence the value restriction.

Outcomes

Once the mission is completed,

  • you know that ocaml executable admit two forms: bytecode files and native executables.
  • you are not afraid of exceptions: you know how to catch them, breed them, raise them.
  • you are able to use side effects when necessary, knowing that the value restriction is here to avoid trouble with polymorphism.