Monday, May 08, 2006

Qi Optimization and Catalan Fun!

We start this week by delving a bit into number theory and seeing what we can accomplish with Qi. We will start with a simple approach to the problem and then perform some optimizations on our program. We will use pattern matching, backtracking, and arrays to enhance the performance of our sequence in order to better understand Qi. Hopefully, this process will show the powers of Qi to search complicated number spaces and what limitations we run into when using Qi in general as well.

Let us start with the empty string, "". We can define this as 0. If we put a pair of parenthesis around the empty string, we get the new string "()". We can call this string 1. Putting a pair of parenthesis around the 1 gives us "(())" or what we will call 2. Alternatively, we can concatenate 1 to 1 to get another version of the string, "()()". Doing this for 3 gives us "((()))" , "()()()" , "(()())", "(())()" and "()(())". These obviously represent basic addition facts if you think of concatenation as + and the parenthesis as + 1. Or we can think of these strings as the number of ways we can parenthesize a string of length N. We have defined these strings via 2 basic operations, either by interning previous strings with parenthesis or by either concatenation be it left-handed or right-handed. This feels reminiscent of the S and K combinators, but perhaps that’s a topic for another time.

So from that description, we can devise a simple Qi program to output those strings. We will call this function get-cstrings. The base case is 0 and we define it as the empty string.

(define get-cstrings
0 -> [""]

Now for the recursive case, we either perform the "interning" operation by putting a pair of parentheses around the previously created number, or by taking any previous number combination that adds to our number and concatenating them together. We define the intern function as intern-all which takes a list of strings and interns each string. We also define the left and right concatenation function as lr-apply. It takes in the interned strings and adds the concatenation strings to it. This gives us the full function definition as follows.

(define get-cstrings
0 -> [""]
N -> (let NewList (intern-all (get-cstrings (- N 1) ))
(lr-apply N NewList))

Turing to intern all, we need to take a list and add the parentheses around each element. So we use map to apply a function across the list. The function should take a string, and add parentheses around the front and back. Using this opportunity to apply the Lisp underbelly of Qi seems fun at this point, and gives us more to optimize later. So we take the string, turn it into a list, append the parentheses, and then turn it back into a string.

(define intern-all
Xs ->
(/. Elem
(COERCE (append [ #\( | (COERCE Elem LIST) ] [ #\) ] ) STRING))

Turing our attention to lr-apply now, we note that it walks from 1 up to n-1, forming new strings from pairs of older strings by concatenating them to each other. So we will start with lr-apply

(define lr-apply
0 Xs -> Xs
N Xs -> (lr-apply* 1 (- N 1) Xs))

And now we need to define the recursive lr-apply* that creates these strings. It takes in the left number and the right number, along with the strings we have built so far. It adds the new strings to the final list. When the right number becomes greater than the left number, we stop. Using the helper function lr-work, we try the following definition.

(define lr-apply*
L R Final ->
(let A (get-cstrings L)
(let B (get-cstrings R)
(lr-apply* (+ L 1) (- R 1) (lr-work A B Final))))
where (>= R L)
_ _ Final -> Final

The lr-work function takes the left list of strings and the right list of strings and combines them and adds them to the final list. It uses a helper function pair-this to concatenate an element to each element of the second list. If we had access to the Haskell prelude, we could define this more concisely. You can check the google group for a beginning implementation.

(define lr-work
[] _ Final -> Final
[X | Xs] Ys Final -> (lr-work Xs Ys (pair-this X Ys Final))

(define pair-this
_ [] Final -> Final
X [Y|Xs] Final -> (pair-this X Xs (make-ap-pair X Y Final))

Finally, we have a function called make-ap-pair that takes 2 strings and returns their left and right concatenation. It uses a helper function named give-unique to ensure that the element has not already been placed in the list. We’ll use the Lisp function concatenate instead of any Qi functions.

(define make-ap-pair
L-num R-num Xs -> (let AfterLeft (give-unique [(CONCATENATE STRING L-num R-num)] Xs)
(give-unique [(CONCATENATE STRING R-num L-num)] AfterLeft))

(define give-unique
[] Xs -> Xs
[X | Left] Xs -> (give-unique Left Xs) where (element? X Xs)
[X | Left ] Xs -> (give-unique Left [ X | Xs ])

Running this gives:

(2-) (get-cstrings 0)

(3-) (get-cstrings 1)

(4-) (get-cstrings 2)
["()()" "(())"]

(5-) (get-cstrings 3)
["(())()" "()(())" "()()()" "(()())" "((()))"]

This function obviously grows the list very quickly. Look at the following.

(6-) (LENGTH (get-cstrings 3))

(7-) (LENGTH (get-cstrings 4))

(8-) (LENGTH (get-cstrings 5))

(9-) (LENGTH (get-cstrings 6))

(10-) (LENGTH (get-cstrings 7))

(11-) (LENGTH (get-cstrings 8))

(12-) (LENGTH (get-cstrings 9))

And c-strings 9 starts to take some noticeable time on my personal box.

(13-) (time (LENGTH (get-cstrings 9)))

Real time: 4.984375 sec.
Run time: 4.953125 sec.
Space: 3335576 Bytes
GC: 6, GC time: 0.0 sec.4862

(14-) (time (LENGTH (get-cstrings 10)))

Real time: 63.984375 sec.
Run time: 63.203125 sec.
Space: 12635520 Bytes
GC: 19, GC time: 0.0625 sec.16796

Obviously, we want to do better. So let us note that the function calls the previous function many times. If we memorize the array, we can find the previous runs with no difficulty. We will use a global array to hold the strings up to 10 named *clst*. We use backtracking to check to see if we already have memorized the result, if not, we compute it.

(define get-pstrings
0 -> [""]
N <- (get-array (value *clst*) [N] #\Escape)
N -> (append (intern-all (get-pstrings (- N 1)) []) (lr-apply N []) ) where (> N 10)
N -> (let NewList (intern-all (get-pstrings (- N 1)) [])
(let LrList (lr-apply N [])
(put-array (value *clst*) [N] (append NewList LrList) )))

We can change the intern-all to make more sense and run much faster. Map ends up being slower than simply defining it recursively.

(define intern-all
[] Out -> Out
[X | Xs] Out -> (intern-all Xs [(CONCATENATE STRING "(" X ")") | Out] )

These simple changes gives us twice the performance.

(10-) (time (LENGTH (get-pstrings 9)))

Real time: 2.53125 sec.
Run time: 2.515625 sec.
Space: 730696 Bytes
GC: 2, GC time: 0.015625 sec.4862

(11-) (time (LENGTH (get-pstrings 10)))

Real time: 33.609375 sec.
Run time: 33.265625 sec.
Space: 2969960 Bytes
GC: 4, GC time: 0.03125 sec.16796

Many more obvious improvements can be made, but that suffices to show how you can flesh out a correct answer then modify it for efficiency.

Of course, if you took this sequence of integers and search for them you’ll find that this is a well known sequence called A000108, Segner numbers, or more often Catalan numbers.
Our favorite number list
has the following interesting definition.

a(n) = binomial(2n, n)-binomial(2n, n-1)

We can easily understand how the binomial(2n,n) is part of the result, as we have strings of N left parentheses and N right parentheses. We must choose equal pairs of parentheses so we have N as the bottom element of the function. However, we must remove the unbalanced parenthesis so we take out the incorrect elements via subtracting the number of strings that are "unbalanced" or one off of balanced. Hence we find the binomial (2n, n-1).

However, if we think about this problem as we have programmed it above, we can come up with an equivalent definition that may help us understand the binomial function. In the program above, it takes the a(n-1) and interns each element, returning the same number. It then builds lists from combinations of previous results. So we have the equation

a(n) = a(n-1) + some_binomial_function

So let us determine what this binomial function turns out to be. Remembering Pascal’s rule of binomial(n-1,k)+binomial(n-1,k-1)=binomial(n,k) allows us to reduce the original definition into the following, with b for binomial.

a(n) = b(2(n-1),n-1) + b(2(n-1),n-2) + b(2(n-1),n-1) + b(2(n-1),n)
–b(2(n-1),n-3) - b(2(n-1),n-2)- b(2(n-1),n-2) - b(2(n-1),n-1)

And remembering the b(n,r)=b(n,n-r) gives us the following reduction.

a(n) = binomial(2(n-1), n-1) – binomial(2(n-1),n-3)

So now we found part of an a(n-1), thus we can find our some_binomial_function from

a(n) = binomial(2(n-1), n-1) –binomial(2(n-1),n-2) + some_binomial_function

some_binomial_functions = binomial(2(n-1),n-2) – binomial(2(n-1),n-3)

So we can see that the first part of the binomial function cancels out with the last part of the a(n-1) function, leaving only the final part as a subtraction. Thinking about what this "some_binomial_function" means, we can see that we take the substrings that we can add together to get a bigger substring, minus the number of invalid and duplicate substrings. This gives us the final version that computes our Qi functions output length.

a(n) = a(n-1) + binomial(2(n-1),n-2) – binomial(2(n-1),n-3)

Of course, we can make more optimizations and go further into the Catalan numbers, which I may do in the future. For now, I hope this serves as a fun introduction to Qi optimization and the always interesting Catalan sequence. Let me know what you think!


Post a Comment