Tensorflow Basics

Why Tensorflow? It is Google's own machine learning library and a very popular one. Tensorflow is useful either for building simple and intuitive models or for deploying massive applications that can be distributed among many different devices, like dedicated servers or even mobile devices.

As the title of these notes suggest, the main concepts and utilities of this library are going to be explored.

It is all about computational graphs

The way Tensorflow works is based on computing things through graphs. What does this mean exactly? A graph, perhaps, might be the best way to make this idea clear.

A graph, in a mathematical sense, is a set of elements that may or may not keep some sort of relationship; it is like a network where the elements are represented by nodes and the relationship between them are represented by vertices. In Tensorflow, nodes are mathematical operations and vertices signal the direction the data flows towards.

In the example from above information flows from left to right. The outer left nodes represent placeholders for data that will undergo certain mathematical operations; nodes A, B and C hold a constant value of 3, 2 and 1 respectively. Node D performs a sum operation and depends on the values of A and B to compute a quantity of 5 (D = A + B). Similarly, E has dependence on B and C to compute a value of 1 through substraction (E = B - C).

Finally, as this graph's ultimate goal, node F computes a value of 5 (F = D * E) through a multiplication operation that has direct dependency on nodes D and E. However, this last node indirectly depends on the values or operations that exist before its immediate neighbour nodes (A, B and C).

But, why computational graphs?

Computational graphs behave like an assembly process. Given that dependencies among nodes can be tracked, the computation process can be decomposed into several subprocesses for which computational resources can be allocated according to different needs. This is what allows a graph-based computation system to be distributed among different machines.

Furthermore, graphs are extremely flexible when it comes to building quantitative models as simple as linear regression or as sophisticated as some neural networks architectures. So, the reason why computational graphs are interesing, not to mention useful, is that both scalability and flexibility can, mostly, build any kind of application no matter how simple or complex; hence, Tensorflow is worth exploring in depth.

A bit of code to exemplify

There is a Tensorflow API in Python which is great for data science projects. The code for computing the graph plotted above is as simple as what follows.

  
   import tensorflow as tf

   #  == First layer ==
   A = tf.constant(3)
   B = tf.constant(2)
   C = tf.constant(1)

   #  == Second layer ==
   D = tf.add(A,B)             #   Same as A + B semantically,
   E = tf.subtract(B,C)        #    ...    B - C ...

   #  == Target node ==
   F = tf.multiply(D,E)        #    ...    D * E ...

   #  == Intitialize session, execute it and close it ==
   sess = tf.Session()
   output = sess.run(F)        #   This variables stores a 5.
   sess.close()

Apart from the last three lines from the previous block, the entire code is self-explanatory. Once the graph has been constructed, a session, using the tf.Session() command, has to be declared. Sessions are the contexts where computations are preformed; in the above example, the output obtained with node F is stored inside the variable named output. After the computation is done, the session should be closed to free computational resources.

Graphs, sessions and the with command

Graphs
When Tensorflow is first imported as a library, a graph is created which, by default, is what is used for any computation; this is what happens when no anticipated specification is done. However, other graphs can be explicitly declared.

  
   new_graph = tf.Graph()

Interactive sessions
Besides the conventional sessions, Tensorflows has a very useful command for evaluating the content of variables and placeholders (which will be discussed shortly).

  
   i_sess = tf.InteractiveSession()

   x = tf.constant([[1,2],
		    [3,4]])

   x.eval() 	# <-- Evaluates and prints what is stored in 'x'.

   i_sess.close()

Interactive sessions are useful when experimenting or inspecting what kind of data types and shapes are to be fed to a graph.

The with command
In order to initialize contexts in which both graphs and sessions can be declared and executed, the with command turns out to be a very handy utility. Suppose a diffrent graph from the default one is to be declared in any given session, to achive this the code would look like this:

  
   with tf.Graph().as_default():
     x = tf.constant(1)
     y = tf.constant(1)
     z = x + y

     with tf.Session() as sess:
       sess.run(z)

What makes the with command useful is that it creates a scope for a graph in which certain variables are declared. Then, within that scope, another context is created for a session to be run and compute whatever the graph's architecture is suppossed to do. Also, another benefit is that with automatically closes the used session to immediately release resources.

By using this command diffrent graphs within different sessions can be used independently. Note that, when declaring a new tf.Graph(), it has to be called along git the as_default method so that Tensorflow knows that this is the current graph context to use.

Numpy compatibility

Using Numpy along with Tensorflow makes certain mathematical tasks very easy. One example of this is matrix multiplication; Numpy creates matrices for Tensorflow and the latter does not have to worry about vector shapes.

  
  import tensorflow as tf
  import numpy as np

   with tf.Graph().as_default():
     a = np.random.random((1,3))
     b = np.random.random((3,1))
     c = tf.matmul(a,b)

     with tf.Session() as sess:
       sess.run(c)

According to the previos multiplication, Tensorflow will return a 1 x 1 matrix as a Numpy array.

Variables and Placeholders for Optimization

Most machine learning models reduce to an optimization problem where there is a cost or error function that evaluates how good a given model is. A model's 'learning' process happens as the error function is minimized given some fixed parameters.

The optimization becomes possible as some other variable parameters change, and finding a set of these latter parameters that result in a lower error function value is what the 'learning' is all about. Tensorflow has both a Variable and a Placeholder object to cache variable and fixed parameters respectively.

Linear Regression Parameters
As an example, consider the following expression of a simple linear equation. $$ f(\mathbf{x}) = \mathbf{x}^{\intercal} \mathbf{w} + \mathbf{b} $$ From this equation, $ f(\mathbf{x}) $ tries to capture a functional explanation to another variable $ \mathbf{y} $ plus some random noise $ \epsilon \sim \mathcal{N}( \mu,\ \sigma^{2}) $. $$ \mathbf{y} = f(\mathbf{x}) + \mathbf{\epsilon} $$ In this case, $ \mathbf{x} $ and $ \mathbf{y} $ represent fixed parameters so they should be represented by Placeholders; $ \mathbf{w} $ and $ \mathbf{b} $, on the other hand, are variable parameters so Variable objects should be used instead.

  
   x = tf.placeholder( tf.float32, shape=[None,3] )
   y = tf.placeholder( tf.float32, shape=None )

   w = tf.Variable( [[0]], dtype=tf.float32 )
   b = tf.Variable( 0, dtype=tf.float32 )

Notice that both placeholders and variables take (at least) two parameters, however, each case is different. Placeholders need to have the data type and shape specified and it is not necessary to pass the data they are going to hold when they are declared; instead, data is passed whithin the session context when the computation is executed. Also, when no shape is specified or when it equals None then data of any size can be fed.

Variables, on the other hand, must have the data passed in as a parameter along with its type. As it can be seen from the code above, the shape of the data is undetermined for the rows and a size of three is specified for the columns.

Now that all the required variales have been declared, the target output can be defined.

  
   y_hat = tf.matmul(tf.transpose(x),w) + b

Linear Regression Cost Function
The common metric for evaluating linear regression models is the Mean Squared Error (MSE) error function. $$ C(\mathbf{w}) = \frac{1}{m}(\mathbf{y} - \mathbf{\hat{y}})^{2} $$ By finding the gradient of the cost function with respect to the models parameters $ \nabla_{\mathbf{w}}C $ to perform gradient descent is how the error can be minimized. This is true since, $$ \mathbf{w_{1}} := \mathbf{w}_{0} - \alpha \nabla_{\mathbf{w}_{0}} C(\mathbf{w_{0}}) \implies C(\mathbf{w_{0}}) \geq C(\mathbf{w_{1}}) $$ Where $ \alpha $ is the learning rate which modulates how big the update should be. By repeating the assingmnent expressed above in an iterative process a good fit for the model is eventually guaranteed.

A great thing about Tensorflow is that derivations and gradients are automatically computed, the only thing that has to be defined is what kind of function should be optimized and by what method. The code for minimizing a MSE function using gradient descent can be done with the following lines.

  
   # Cost function.
   cost = tf.reduce_mean( tf.square(y - y_hat) )

   # Learning rate.
   alpha = 0.01

   # Tell Tensorflow the method to optimize and how (minimize).
   optimizer = tf.train.GradientDescentOptimizer(learning_rate)
   minimizer = optimizer.minimize(cost)

Altogether...

  
   import numpy as np
   import tensorflow as tf
   
   x_data = np.random.randn(2000,1)
   w_real = [0.7]
   b_real = -0.2
   noise = np.random.randn(1,2000) * 0.3
   y_data = np.matmul(w_real,x_data.T) + b_real + noise
   
   iters = 5
   
   g = tf.Graph()
   
   with g.as_default():
   # Declare variables and target calculation.
     x = tf.placeholder(tf.float32, shape=[None,1])
     y_true = tf.placeholder(tf.float32, shape=None)
     w = tf.Variable([[0]], dtype=tf.float32, name='weights')
     b = tf.Variable(0, dtype=tf.float32,name='bias')
     y_pred = tf.matmul(w, tf.transpose(x)) + b
  
   # Specify loss function, learning reate ant optimization type.
     loss = tf.reduce_mean(tf.square(y_true - y_pred))
     learning_rate = 0.5
     optimizer = tf.train.GradientDescentOptimizer(learning_rate)
     train = optimizer.minimize(loss)
   
   # Variables have to be initialized.
     init = tf.global_variables_initializer()
   
     with tf.Session() as sess:
   # First, run the Variables initialization.
         sess.run(init)
   
         for step in range(iters):
           sess.run(train, { x: x_data, y_true: y_data })
           print(step, sess.run([w,b]))