到现在我们已经讲了不少东西了,但我们还没有真正涉及到函数式编程. 目前所讲的所有特性 - 丰富的数据类型(rich data types), 模式匹配(pattern matching), 类型推导(type inference), 嵌套函数(nested functions) - 可以想象它们都可以在一种”超级C“语言中存在。这些特性当然很酷,它们使得代码简洁易读,减少bug,但是它们实际和函数式编程没什么关系。实际上我的观点是函数式编程语言的妙处不是在于函数式编程,而是因为在我们长年习惯于类C语言编程的时候,编程技术已经提高很多了。因此当我们一次又一次地写struct { int type; union { ... } }的时候,ML和Haskell程序员却有着很多安全的变量和数据类型的模式匹配。当我们小心翼翼地 free所有的malloc时候,很多语言在上世纪八十年代就有了超越手工管理的内存垃圾收集器。
好了,现在是时候告诉你们什么是函数式编程了。
基本的不是很能说明问题的定义是在函数式语言中, 函数(functions)是一等公民。
听上去不是很有用,让我们来看个例子。
# let double x =
x * 2
in
List.map double [ 1; 2; 3 ];;
- : int list = [2; 4; 6]
在这个例子中,我首先定义了一个嵌套函数double,它读入一个参数x后返回x * 2。然后map在给定的列表([1; 2; 3])的每个元素上调用double来生成结果:一个每个数都扩大一倍的新的列表。
map被称为高阶函数(higher-order function) (HOF)。高阶函数是指一个把其他函数作为参数之一的函数。
到现在为止还算简单。如果你对C/C++熟悉的,这就象传递一个函数指针作为参数。Java中有匿名类(anonymous class)就象一个低速的闭包(closure)。如果你知道Perl那么你可以已经知道和使用了Perl中的闭包和Perl的map函数,这和我们现在所说的完全相同。事实上Perl很大程度上也是一个函数式语言。
闭包是那些带着它们被定义时的环境的函数。特别的,一个闭包可以引用它定义时存在的变量。让我们把上面那个函数变得更通用一些,以便我们可以对任何整数列表乘以一个任意值n:
let multiply n list =
let f x =
n * x
in
List.map f list
;;
因此:
# multiply 2 [1; 2; 3];; - : int list = [2; 4; 6] # multiply 5 [1; 2; 3];; - : int list = [5; 10; 15]
关于multiply函数有一点值得注意的是嵌套函数f. 这是一个闭包。我们注意一下f怎样使用变量n的值,我们并没有把n作为显式的参数传递给它。f是从它的环境中找到它的。n是传递给函数multiply的参数,所以在这个函数中都是有效的。
This might sound straightforward, but let's look a bit closer at that call to map: List.map f list.
map is defined in the List module, far away from the current code. In other words, we're passing f into a module defined "a long time ago, in a galaxy far far away". For all we know that code might pass f to other modules, or save a reference to f somewhere and call it later. Whether it does this or not, the closure will ensure that f always has access back to its parental environment, and to n.
Here's a real example from lablgtk. This is actually a method on a class (we haven't talked about classes and objects yet, but just think of it as a function definition for now).
class html_skel obj = object (self)
...
...
method save_to_channel chan =
let receiver_fn content =
output_string chan content;
true
in
save obj receiver_fn
First of all you need to know that the save function called at the end of the method takes as its second argument a function (receiver_fn). It repeatedly calls receiver_fn with snippets of text from the widget that it's trying to save.
Now look at the definition of receiver_fn. This function is a closure alright because it keeps a reference to chan from its environment.
Let's define a plus function which just adds two integers:
let plus a b = a + b ;;
Some questions for people asleep at the back of the class.
plus?plus 2 3?plus 2?Question 1 is easy. plus is a function, it takes two arguments which are integers and it returns an integer. We write its type like this:
plus : int -> int -> int
Question 2 is even easier. plus 2 3 is a number, the integer 5. We write its value and type like this:
5 : int
But what about question 3? It looks like plus 2 is a mistake, a bug. In fact, however, it isn't. If we type this into the OCaml toplevel, it tells us:
# plus 2;; - : int -> int = <fun>
This isn't an error. It's telling us that plus 2 is in fact a function, which takes an int and returns an int. What sort of function is this? We experiment by first of all giving this mysterious function a name (f), and then trying it out on a few integers to see what it does:
# let f = plus 2;; val f : int -> int = <fun> # f 10;; - : int = 12 # f 15;; - : int = 17 # f 99;; - : int = 101
In engineering this is sufficient proof by example for us to state that plus 2 is the function which adds 2 to things.
Going back to the original definition, let's "fill in" the first argument (a) setting it to 2 to get:
let plus 2 b = (* This is not real OCaml code! *) 2 + b ;;
You can kind of see, I hope, why plus 2 is the function which adds 2 to things.
Looking at the types of these expressions we may be able to see some rationale for the strange -> arrow notation used for function types:
plus : int -> int -> int
plus 2 : int -> int
plus 2 3 : int
This process is called currying (or perhaps it's called uncurrying, I never was really sure which was which). It is called this after Haskell Curry who did some important stuff related to the lambda calculus. Since I'm trying to avoid entering into the mathematics behind OCaml because that makes for a very boring and irrelevant tutorial, I won't go any further on the subject. You can find much more information about currying if it interests you by doing a search on Google.
Remember our double and multiply functions from earlier on? multiply was defined as this:
let multiply n list =
let f x =
n * x
in
List.map f list
;;
We can now define double, triple &c functions very easily just like this:
let double = multiply 2;; let triple = multiply 3;;
They really are functions, look:
# double [1; 2; 3];; - : int list = [2; 4; 6] # triple [1; 2; 3];; - : int list = [3; 6; 9]
You can also use partial application directly (without the intermediate f function) like this:
# let multiply n = List.map (( * ) n);; val multiply : int -> int list -> int list = <fun> # let double = multiply 2;; val double : int list -> int list = <fun> # let triple = multiply 3;; val triple : int list -> int list = <fun> # double [1; 2; 3];; - : int list = [2; 4; 6] # triple [1; 2; 3];; - : int list = [3; 6; 9]
In the example above, ((*) n) is the partial application of the (*) (times) function. Note the extra spaces needed so that OCaml doesn't think (* starts a comment.
You can put infix operators into brackets to make functions. Here's an identical definition of the plus function as before:
# let plus = (+);; val plus : int -> int -> int = <fun> # plus 2 3;; - : int = 5
Here's some more currying fun:
# List.map (plus 2) [1; 2; 3];; - : int list = [3; 4; 5] # let list_of_functions = List.map plus [1; 2; 3];; val list_of_functions : (int -> int) list = [<fun>; <fun>; <fun>]
Functional programming, like any good programming technique, is a useful tool in your armoury for solving some classes of problems. It's very good for callbacks, which have multiple uses from GUIs through to event-driven loops. It's great for expressing generic algorithms. List.map is really a generic algorithm for applying functions over any type of list. Similarly you can define generic functions over trees. Certain types of numerical problems can be solved more quickly with functional programming (for example, numerically calculating the derivative of a mathematical function).
A pure function is one without any side-effects. A side-effect really means that the function keeps some sort of hidden state inside it. strlen is a good example of a pure function in C. If you call strlen with the same string, it always returns the same length. The output of strlen (the length) only depends on the inputs (the string) and nothing else. Many functions in C are, unfortunately, impure. For example, malloc - if you call it with the same number, it certainly won't return the same pointer to you. malloc, of course, relies on a huge amount of hidden internal state (objects allocated on the heap, the allocation method in use, grabbing pages from the operating system, etc.).
ML-derived languages like OCaml are "mostly pure". They allow side-effects through things like references and arrays, but by and large most of the code you'll write will be pure functional because they encourage this thinking. Haskell, another functional language, is pure functional. OCaml is therefore more practical because writing impure functions is sometimes useful.
There are various theoretical advantages of having pure functions. One advantage is that if a function is pure, then if it is called several times with the same arguments, the compiler only needs to actually call the function once. A good example in C is:
for (i = 0; i < strlen (s); ++i)
{
// Do something which doesn't affect s.
}
If naively compiled, this loop is O(n2) because strlen (s) is called each time and strlen needs to iterate over the whole of s. If the compiler is smart enough to work out that strlen is pure functional and that s is not updated in the loop, then it can remove the redundant extra calls to strlen and make the loop O(n). Do compilers really do this? In the case of strlen yes, in other cases, probably not.
Concentrating on writing small pure functions allows you to construct reusable code using a bottom-up approach, testing each small function as you go along. The current fashion is for carefully planning your programs using a top-down approach, but in the author's opinion this often results in projects failing.
C-derived and ML-derived languages are strict. Haskell and Miranda are non-strict, or lazy, languages. OCaml is strict by default but allows a lazy style of programming where it is needed.
In a strict language, arguments to functions are always evaluated first, and the results are then passed to the function. For example in a strict language, this call is always going to result in a divide-by-zero error:
give_me_a_three (1/0);;
If you've programmed in any conventional language, this is just how things work, and you'd be surprised that things could work any other way.
In a lazy language, something stranger happens. Arguments to functions are only evaluated if the function actually uses them. Remember that the give_me_a_three function throws away its argument and always returns a 3? Well in a lazy language, the above call would not fail because give_me_a_three never looks at its first argument, so the first argument is never evaluated, so the division by zero doesn't happen.
Lazy languages also let you do really odd things like defining an infinitely long list. Provided you don't actually try to iterate over the whole list this works (say, instead, that you just try to fetch the first 10 elements).
OCaml is a strict language, but has a Lazy module that lets you write lazy expressions. Here's an example. First we create a lazy expression for 1/0:
# let lazy_expr = lazy (1/0);; val lazy_expr : int lazy_t = <lazy>
Notice the type of this lazy expression is int lazy_t.
Because give_me_a_three takes 'a (any type) we can pass this lazy expression into the function:
# give_me_a_three lazy_expr;; - : int = 3
To evaluate a lazy expression, you must use the Lazy.force function:
# Lazy.force lazy_expr;; Exception: Division_by_zero.
One term which you'll hear a lot when discussing functional languages is "boxed". I was very confused when I first heard this term, but in fact the distinction between boxed and unboxed types is quite simple if you've used C, C++ or Java before (in Perl, everything is boxed).
The way to think of a boxed object is to think of an object which has been allocated on the heap using malloc in C (or equivalently new in C++), and/or is referred to through a pointer. Take a look at this example C program:
#include <stdio.h>
void
printit (int *ptr)
{
printf ("the number is %d\n", *ptr);
}
void
main ()
{
int a = 3;
int *p = &a;
printit (p);
}
The variable a is allocated on the stack, and is quite definitely unboxed.
The function printit takes a boxed integer and prints it.
The diagram below shows an array of unboxed (top) vs. boxed (below) integers:
No prizes for guessing that the array of unboxed integers is much faster than the array of boxed integers. In addition, because there are fewer separate allocations, garbage collection is much faster and simpler on the array of unboxed objects.
In C or C++ you should have no problems constructing either of the two types of arrays above. In Java, you have two types, int which is unboxed and Integer which is boxed, and hence considerably less efficient. In OCaml, the basic types are all unboxed.