# Practical tutorial on autoencoders for nonlinear feature fusion (Part 1)

** Published:**

In a previous blog, we discussed how curse of dimensionality degrades the performance of machine learning models. There is a plethora of techniques that can tackle this issue and today’s paper describes one of them, autoencoders.

## Autoencoders 101

Autoencoders (AE), are artificial neural networks with symmetric structure which is split in two parts, the encoder and the decoder. The former creates codifications of the input data, while the latter tries to decodify them in a way that resembles the inputs as closely as possible, without though simply copying the observations.

In detail, both the input and output layers have the same dimension, since the objective is to recreate the given data. Moreover, the middle layer of an AE represents the encoding of the input and can contain either more or less units than the other layers, depending on the desired properties. In fact, an AE can have multiple hidden layers which are placed symmetrically in the encoder and decoder.

## Activation functions

To pass signals from a layer to the next, we are using activation functions. There are many choices here but only few of them are used in practice.

**Linear**: Autoencoders with a single hidden layer with k hidden neurons and linear activations create equivalent representations to PCA with k principal components.**Binary**: It is often used as introduction to ANN and not in real world applications.**ReLU**: Rectified linear units are widely used in deep learning models. However, they are not suitable for AEs because they distort the decoding process by outputting 0 for negative inputs and consequently, do not lead to faithful representations of the input features.**SELU**: Scaled exponential linear units activation function is a formidable alternative to ReLU as it preserves the advantages of linearly passing the positive inputs while it enables the flow of negative too.**Sigmoid**: The most commonly used activation function for autoencoders.**Tanh**: Hyperbolic tangent is similar to sigmoid with the difference that is symmetric to the origin and its slope is steeper. As a result, it produces stronger gradients than sigmoid and should be preferred.

Finally, it should be mentioned that various combinations of activation functions can be used when designing AEs with multiple hidden layers so feel free to experiment!

## Autoencoders’ network structure

Autoencoders have a symmetrical structure and can be split into two categories, depending on the dimensionality of the encoding layers:

**Undercomplete**: The encoding layers have a lower dimensionality (fewer hidden units) than the input. This structure is used when the goal is to reduce the dimensionality of the input and create a compact representation of it.**Overcomplete**: The encoding layers have the same or more units than the input layer.

Moreover, AEs can be either shallow or deep, depending to the number of hidden layers:

**Shallow**: The autoencoder has a single hidden layer.**Deep**: The autoencoder has more than one hidden layer.

## Autoencoder taxonomy

When using AEs for feature fusion, we can build a taxonomy of the various models according to their properties and use cases.

**Lower dimensionality**: High dimensional data might degrade a machine learning model’s performance, thus, autoencoders can be used to reduce the dimensions of the feature space and enable faster and more accurate learning.**Regularisation**: Learned features are sometimes required to present special mathematical properties. AEs can produce encodings that verify them.**Noise tolerance**: Noisy data can distort a model’s predictions and the latent feature representation by AEs can mitigate this.**Generative model**: The goal of these AEs is to map new samples from the encoded space to the original features.

**Part 2**: Autoencoders for feature fusion

**Part 3**: Comparing AEs to other feature fusion techniques