Understanding Tensorflow's RNN model

Posted on 2016-03-01 |

The RNN cell implementation in Tensorflow can be found at here. The RNN model can be found here.

One great LSTM RNN tutorial is Colah’s Understanding LSTM Networks.

RNN Cell

The basic definition of RNN cell in Tensorflow is

1 2	def __call(self, inputs): truereturn inputs

def __call__(self, inputs, state, scope=None):

true"""Run this RNN cell on inputs, starting from the given state.
truetrue  
true Args:
truetrueinputs: 2D Tensor with shape [batch_size x self.input_size].
truetruestate: 2D Tensor with shape [batch_size x self.state_size].
truetruescope: VariableScope for the created subgraph; defaults to class name.
true
true Returns:
truetrueA pair containing:
truetrue- Output: A 2D Tensor with shape [batch_size x self.output_size]
truetrue- New state: A 2D Tensor with shape [batch_size x self.state_size].
true"""
true
```

And an instance looks like

def __call__(self, inputs, state, scope=None):
true"""Most basic RNN: output = new_state = tanh(W * input + U * state + B).
truewith vs.variable_scope(scope or type(self).__name__):  # "BasicRNNCell"
truetrueoutput = tanh(linear([inputs, state], self._num_units, True))
truetruereturn output, output
true"""

As we can see from the code, a RNN cell needs two inputs, inputs, and state, and then calculate the score and then return the result. Inside the cell, the calcuation is performed by tanh(linear([inputs, state], self._num_units, True)), therefore we need to check the definition of linear, which is

def linear(args, output_size, bias, bias_start=0.0, scope=None):
true"""Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.
truetrueArgs:
truetruetrueargs: a 2D Tensor or a list of 2D, batch x n, Tensors.
truetruetrueoutput_size: int, second dimension of W[i].
truetruetruebias: boolean, whether to add a bias term or not.
truetruetruebias_start: starting value to initialize the bias; 0 by default.
truetruetruescope: VariableScope for the created subgraph; defaults to "Linear".
truetrueReturns:
truetruetrueA 2D Tensor with shape [batch x output_size] equal to
truetruetruesum_i(args[i] * W[i]), where W[i]s are newly created matrices.
truetrueRaises:
truetruetrueValueError: if some of the arguments has unspecified or wrong shape.
true"""
true
trueassert args
trueif not isinstance(args, (list, tuple)):
truetrueargs = [args]

true# Calculate the total size of arguments on dimension 1.
truetotal_arg_size = 0
trueshapes = [a.get_shape().as_list() for a in args]
truefor shape in shapes:
truetrueif len(shape) != 2:
truetruetrueraise ValueError("Linear is expecting 2D arguments: %s" % str(shapes))
truetrueif not shape[1]:
truetruetrueraise ValueError("Linear expects shape[1] of arguments: %s" % str(shapes))
truetrueelse:
truetruetruetotal_arg_size += shape[1]

true# Now the computation.
truewith vs.variable_scope(scope or "Linear"):
truetruematrix = vs.get_variable("Matrix", [total_arg_size, output_size])
truetrueif len(args) == 1:
truetruetrueres = math_ops.matmul(args[0], matrix)
truetrueelse:
truetruetrueres = math_ops.matmul(array_ops.concat(1, args), matrix)
truetrueif not bias:
truetruetruereturn res
truetruebias_term = vs.get_variable(
truetruetrue"Bias", [output_size],
truetruetrueinitializer=init_ops.constant_initializer(bias_start))
truereturn res + bias_term

Linear Algebra Basics

Posted on 2016-03-01 |

Matrix Multiplication

Basic notations

Vector

Vector is usually referred as a column vector, which is a $k \times 1$ matrix. An example of $3 \times 1$ vector is as follow

$\mathbf{v} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}$

Row vector, on the other hand, is the transpose of a column vector. The transpose of $\mathbf{v}$ is a $1 \times 3$ row vector.

$\mathbf{v}^{T} = \begin{pmatrix} 1 & 2 & 3 \\ \end{pmatrix}$

Matrix

Matrix is denoted by $\textsf{#row} \times \textsf{#column}$. A $4 \times 3$ matrix looks like

$\mathbf{X} = \begin{pmatrix} 1 & 2 & 3\\ 1 & 2 & 3\\ 1 & 2 & 3 \\ 1 & 2 & 3 \\ \end{pmatrix}$

Basic Arithmetic operations

Matrix product

Simply put, a $a \times b$ matrix multiplies a $b \times c$ matrix equals to a $a \times c$ matrix. If the number of columns of first matrix is not the same as the number of rows of the second matrix, these two matrices can not multiply each other.

Dot product

Concepts

Normal vector

Unit vector

Association rule basics

Posted on 2016-02-24 |

$X \Rightarrow Y$ means X associated with Y.

Support

The portion of transactions that contain item set X.

This number presents how specific item set X is.

Confidence

The portion of transactions that contain both item set X and Y.

This number presents/estimate the conditional probability that Y exits when X exists.

Lift

$lift(X \Rightarrow Y) = \frac{supp(X\cup Y)}{supp(X) \times supp(Y)}$

The ratio of the observed support to X and the assumption that X and Y are independent.

Conviction

$conv(X \Rightarrow Y) = \frac{1-supp(Y)}{1 - conf(X \Rightarrow Y)}$

The ratio of the expected frequency that X occurs without Y.

First order logic

Posted on 2016-02-18 |

First Order Loigc, as known as first order predicate calculus, is defined by the following terms:

Term, a variable, constant, or the result of a function. Denoted by upper case letters,e.g.,, A.
N-place function, n is the arity of the function. n-place function means a function takes n arguments, and will return an object that has certain relations with those n inputs. Denoted by f.
Predicate, an operator that will return either true or false. Predicates can be seen as a special type of functions which takes a set of arguments and return a true or false based on the definition of this predicate. Denoted by P.
Sentential formula, an expression that represents a setence if we substitute variables to proper words (constants).
atomic statement, constant, variable, 0-place function, and 0-place predicate.
Universal quantifier, $\forall$, for all
Existential quantifier, $\exists$, exists
Scope of the respective quantifier, $\forall x B$, $B$ is the scope of the universal quantifier $\forall$.
Free variable, a variable that is not in the scope of respective quantifier.

With above definitions, first-order predicate calculus is defined by the following rules:

Any atomic statement is a sentential formula.
If $B$ and $C$ are sentential formulas, then $\neg B$, $B \land C$, $B \lor C$, $B \implies C$ are all sentential formulas based on propositional calculus.
If $B$ ia a sentential formula in which $x$ is a free variable, then $\forall x B$, $\exists x B$ are sentential formulas.

Partition, Kth Element, and Sort

Posted on 2015-10-11 |

Partition (2-part partition)

It is very often that we need to partition an array or something into two parts so we can do quick sort. The partition of such purpose can be done using the following code:

This code partitions the subarray of vec[i:j], i and j are inclusive.

Note that at the start of this code snippet we first try if the pivot is the starting index of this subarray, which means if the subarray only contains two elements. It is straightforward to prove this because floor((i+j)/2)=i, then i+j = 2*i (+0.5), therefore j = i or j = i + 1. In this case we can not execute the normal procedure because it will have out-of-bound problem.

  0 1
  ^ ^
  i j

  0 1
  ^
(i,j)

  0 1
^   ^
i   j

See above example, while(vec[j] > pivot) --j will execute until vec[j] == 0. At that time, vec[i] == 0. Then we swap them two and move i and j by one, we get j points to 1 and i points to somewhere out of the boundary.

To avoid this problem, we add that extra procedure that if there are only two elements, do not bother to do the fancy partition just sort them and return the index of pivot.

However, the resulting index that returned by this function is not always the index of pivot. Remember at the very beginning we said that this function partitions an array into two parts, we do not specify that by what number. An example is as follow:

*start*

pivot = vec[1] = 0
index: 0  1  2  3
value: -1 0  1 -1
       ^        ^
       i        j

*i-while && j-while*

pivot = vec[1] = 0
index: 0  1  2  3
value: -1 0  1 -1
          ^     ^
          i     j

*swap*

pivot = vec[1] = 0
index: 0  1  2  3
value:-1 -1  1  0
             ^
           (i,j)

*i-while && j-while*

pivot = vec[1] = 0
index: 0  1  2  3
value:-1 -1  1  0
          ^  ^
          j  i

*stop*

In this case, the resulting partitions are -1,-1 and 1,0. Although all elements in left part are smaller than things in the right part, the partition is not based on pivot = 0. To make a partition that left < pivot < right, we need a 3-part partition instead of this basic 2-part partition.

3-Part Partition

Note in the previous section we mentioned that the partition used in quick sort is not a 3-part partition which means not all the elements in the left part is smaller than the pivot and all the elements in the right part is larger than the pivot. This is fine for quick sort since the ultimiate goal of quick sort is sort the entire array and we do not really care about this extra feature. But if we want to explicitly find an item, or say find kth element (it can be the largest or the smallest), we need a 3-part partition.

A sample code is described as follow:

The key idea of this function is, we keep three indexes, l means a boundary that all elements before l are smaller than pivot; r means a boundary that all elements after r are larger than pivot.

At the beginning of this function we do not know anything about the comparison result so l and r is the original boundary of this input subarray.

Then we start compare the elements, every time we find a smaller element, we switch it with vec[l] because l is the successor of a sequence of smaller elements, then we increase k and l by one because 1) l is no longer the boundary, l+1 is, and 2) the element we switched to k is equal to pivot so we start on next.

If we find an element that is larger than pivot, we switch it with r. Note here we only decrease r by 1 because r-1 now is the new boundary, but we have not examed vec[r], which now is at vec[k], so we need to exam it first before we move to next index. In previous paragraph we move k and l at the same time because every element between start ~ k have beeen compared and we know that there are no element that is larger than pivot.

This 3-part partition can have some interesting usage except find kth element. For example if we have a limited value range of elements and we want to do a in place sort, we can use partition sort by setting the pivot as all the elements but the smallest and the largest one. Say we have an array of elements that has value 1,2,3,4, we can set the pivot to 2,3 so after running the aforementioned algorithm, we can put all 1s at the beginning of the array and all 4s at the end of the array. then we repeat the same procedure by sorting 2,3 by pivot 2.5 in subarray vec[l:r].

Compilers Note

Posted on 2015-10-10 |

First and Follow set

First set can be computed by the following rules:

If there are terminals at the start of a production rule, add them into first set. A -> a B, where a is a terminal.
If there are nonterminals at the start of a production rule, add their first set into first set. A -> B, where B is a nonterminal.
If one ore more nonterminals can be dreived to epsilon, find the first thing after all the epsilons, apply 1 and 2.

Follow set can be computed by the following rules:

Add $ into follow set
Find all occurrence of the non-terminal of interest that appears on the right hand side.
If it is the last token in a production rule, for example A -> xxxxxB, then B contains all the follows of A.
If it is followed by other tokens, for example A -> xxxBC, if C is terminal, put C into B‘s follow set, if non-terminal, put first(C) into follow(B).
If things following B can dreive to epsilon, find the first thing after all epsilons, apply 2 and 3.

LL(1) Grammar

No left recursion: E -> E + T is invalid.
No common left prefix: E -> ab | ac is invalid
No ambiguity: S -> if E then S | if E then S else S (dangling else)

How to solve left recursion:

E -> E + id
E -> id
E -> (E)

First, find the production rule with left recursion E -> E + id, then cut off the right part after the recursive non-terminal E, which is +id, create a new production rule E' -> +id E'|e. Convert the original left recursive rule to E -> idE'.

How to solve common prefix

For production rule E -> ab | ac, extract common prefix, which in this case is a and construct two new production rule E -> aE' and E' -> b | c.

Note that left factoring can not solve ambiguity:

S -> if E then S and S -> if E then S else S can be converted into S -> if E then S S', S' -> else S | e. But in certain cases we can not determine which if an else belongs to (the dangling else problem).

LL(1) parse table

After we get the first and follow sets, we do the following

For First(A), write corresponding production rule at [A,x] where x is an item in First(A) except epsilon.
For epsilon, write all corresponding production rule at [A,y], where y is an item in Follow(A).

LR(0) Grammar

If there are shift and reduce in the same configuration set, then it is not a LR(0) grammar. If we consider the SLR parsing table, it is a LR(0) grammar if there is no shift and reduce on the same row.

SLR(1) Grammar

It is a SLR1 if in a SLR parsing table, there is no shift and reduce in the same cell. We reduce when the following character is in Follow(A) where A is the current nonterminal on the top of the stack.

An example that is not SLR is

S -> L = R
S –> R
L –> *R
L –> id
R –> L

because when

S -> L. = R
R -> L.

Follow(L) contains =, therefore we can not distinguish shift and reduce by looking at next symbol =.

LR(1) Grammar

Instead of reduce by items in Follow, we use a lookahead Here

S' -> .S, $
S -> .L = R, $
S -> .R, $
L -> .*R, =
L -> .id, =
R -> L, $

Note we write down the lookahead of an item, for example, S' -> .S ends with $, and S -> .L = R. Therefore the token follows S -> .L = R, $ is $. And for S -> .L = R, $, we also have L -> .*R, and because L is followed by = in the previous item, so L -> .*R, =.

A grammar is not LR(1) only if there is a shift-reduce conflict that has an overlapped lookhead set.

LALR(1) Grammar

This is based on LR(1) automaton but merge items with same core (but different lookheads).

If we have two items

S -> R, a/b
S -> R, $

then we combine them into one item because they have the same core S -> R

S -> R, $/a/b

Reference

LL1 Praser
SLR and LR Parsing

Questions About Infinite Inputs

Posted on 2015-10-06 |

Given an infinite input of integers (or others), find the following things:

The sum of the last k elements
The average (mean) of the last k elements
The minimum of the last k elements
The maximum of the last k elements
The median of the last k elements

Sum

Well, there are two ways of thinking this, first is the sum of last k elements can be stored in a single primitive type. In this case, we store the last k elements and keep a cache of the ksum.

auto it = input.begin();
deque<int> lastK;
long long k = 10000; // whatever window size
long long ksum = 0;
while(it != input.end()) {
  if (lastK.size() <= k) {
    lastK.push_back(*it);
    ksum += *it;
  } else {
    ksum -= lastK.front();
    lastK.pop_front();
    lastK.push_back(*it);
    ksum += *it;
  }
}

If the sum can not be stored, we can define a new ksum class to store the sum by combining the values. The basic idea of that class is, instead of storing the sum directly, we store m partial sums for the k elements.

Mean

To calculate the mean we will use the ksum we have in previous section and divide it by k.

Minimum

To get the minimum, we can traverse through lastK and find the minimum value, which costs O(k) time, if we want O(logk) time, we can keep a minHeap but the space cost may double if we also want O(logk) insertion time.

Here is a findMin O(logk) but insert/delete O(k) example:

class minHeap {
  vector<int> heap;
  long long size;

public:
  int push(int val) {
    if (heap.size() < size + 1) heap.resize(size + 2, 0);
    heap[++size] = val;
    // do swim at heap[size]
    return swim(size); // return final position
  }

  void pop(int val) {
    if (size <= 0) return;
    heap[1] = heap[size--];
    // do sink at heap[1]
  }

  int front() const {
    if (size > 0) return heap[1];
    return -1; // TODO: should define error here
  }

  void remove(int pos) {
    if (size < pos) return;
    heap[pos] = heap[size--];

    // do sink at heap[pos]

  }

}
auto it = input.begin();
deque<int> pos;
minHeap mHeap;

while(it != input.end()) {
  if (pos.size() < k) {
    pos.push_back(mHeap.push(*it));
  } else {
    mHeap.remove(pos.front());
    pos.pop_front();
    pos.push_back(mHeap.push(*it));
  }
}

Maximum

Maximum is similar to minimum

Median

Median is the hardest one, we need to keep two heaps, one uses for the smallest 1~(k/2) elements, one for the other (k/2 + 1) ~ k elements. The first heap is max heap because we want the largest number in the first part which can be the candidate of the median; the second heap is a min heap because we want the smallest number in the second part which can be the candidate of the median.

The algorithm works as follows:

if there is no valid median (at the beginning), we put the input at median
if the input is larger/equal to median, insert it into right min heap, otherwise insert it into left max heap
if the size difference is larger than 2, we pop the top of the heap that has the larger size, and then put median into the other heap

Subarray Problems

Posted on 2015-09-30 |

Maximal Subarray

Given an unsorted array s[0:n], find a subarray s[i:j] that maximize sum(s[i:j]).

sum(s[i:j]) equals to sum(s[0:j]) - sum(s[0:i-1]), in which sum(s[0:j]) is a fixed value when j is known. Hence maximizing sum(s[i:j]) becomes minimize sum(s[0:i-1]), where i-1 < j.

Therefore we need sum(s[0:i]), and the minimum subarray min(s[0:k]) where k < i to calculate max(s[i:j]).

1
2
3

sum(s[0:i]) = s[0:i]
min(s[0:k]) = min(sum(s[0:k-1]), sum(s[0:k]))
max(s[k+1:j]) = max(sum(s[0:j]), sum(s[0:j]) - min(s[0:k]), max(s[k+1:j-1]))

Minimal Subarray

Similar idea but we need to keep the minimal s[0:k].

Permutation

Posted on 2015-09-29 |

Next Permutation (With duplication)

First let’s check the first six permutation of sequence [1,2,3,4] in lexical order.

If we starting from the tail of each permutation and try find an ascending sequence [n-1 : j], then the discovered sequences are:

1 2 3 [4]
1 2 [4 3]
1 3 2 [4]
1 3 [4 2]
1 4 2 [3]
1 [4 3 2]

Now for every permutation we successfully divide it into two parts, then if we exam the difference between two adjacent permutation, we can find the following rules:

get s[j - 1]
find the first s[k] in s[n-1:j] that is larger than s[j-1]
swap s[j-1] and s[k], sort s[j:n-1] in ascending order

1	Find ascending subarray starting from the end, the left element of this subarray should be replaced by the first element at the right of the subarray that is larger than the element.

Previous Permutation (With duplication)

1	Find descending subarray starting from the end, the left element of this subarray should be replaced by the first element at the right of the subarray that is smaller than the element.

Permutation (Without duplication)

We can do a depth-first search to construct the permutation, or use the code snippet we shown above to generate next permutation until reach asc == 0. Note the result of both solution has the same order.

The DFS method is maintaining a visited vector and for every recursive call pick one unvisited node and visit it until all nodes are visited (put into the temporary result vector).

Permutation (With duplication)

When there are duplications in nums, create a map in which the key is the number in nums and the value is the count of the occurrence of key. When doing DFS, traverse through the map and try any numbers that has a value larger than 1.

Construct Binary Tree Using Traversal

Posted on 2015-09-28 |

First let’s check an example

The traversal results are as follow:

Index:   0  1  2  3  4  5  6  7
In:      4 10  3  1  7 11  8  2
Pre:     7 10  4  3  1  2  8  11
Post:    4  1  3 10 11  8  2  7

Construct a binary tree using inorder and preorder traversal

It is clear that the first element in preorder traversal is the root of this tree, so we divide the inorder vector using the root.

1
2
3

Index:    0   1  2  3  4  5  6  7
In:      [4  10  3  1] 7[11  8  2]
Pre:     [7] 10  4  3  1  2  8  11

As we can see, [4,10,3,1] are in the left subtree of root and [11,8,2] are in the right subtree of root. Therefore, we can do the find root->get left and right subtree procedure recursively. However, a challenge here is how to determine the root of left and right subtree.

To solve the above problem, we need to go back to the tree representation and recall the way that preorder traversal is done.

1 2	root left right Pre: [7] [10 4 3 1] [2 8 11]

If we divide the result of preorder traversal into three parts, we can see the root is followed by a left subtree and right subtree, so if we can know the size of left subtree, we can know the preorder traversal of right subtree, which is rootPosition + leftSubTreeSize + 1. Luckly by finding the position of 7 in inorder traversal, we can know the size of left subtree (and the right subtree also). Note that [10,4,3,1] and [2,8,11] are also preorder traversal result, which means the first element in these two vector are the root of them respectively.

To implement this algorithm, we need three indexes, first one is rootPos, which denotes the index of the root of current subtree, lpos and rpos denote the range of rootPos‘s decedents (not children, which are directly connected). We can find lpos and rpos by finding the root in inorder, and by fiding the root in inorder traversal, the size of two subtrees can be determined and therefore we can know the root of these two subtress in preorder traversal.

The code are as follow:

Construct a binary tree using inorder and postorder traversal

The difference between preorder and postorder traversal is that whether we visit the root at first or at last.

1
2
3

Index:   0  1  2  3  4   5  6  7
In:     [4 10  3  1] 7 [11  8  2]
Post:    4  1  3 10 11   8 2  [7]

Therefore all we need to do is change the way we find the root of subtrees, namely we do it from right to left by minus the size of right tree.

The code are as follow:

Baoxu(Dash) Shi

GitHub Twitter