Deep Learning: A Technique of Machine Learning

In traditional machine learning, a programmer gives the computer a defined set of data and adjusts the algorithm if it returns incorrect results. This gives the computer hard rules to follow to output a result. Deep Learning is a subset of machine learning. It is another technique used to teach computers to learn from data. Deep learning takes a raw undefined set of data and parses the data to learn about the the data set being trained. Since deep learning doesn't require a programmer to readjust the algorithm, it requires a large data set in order to for the algorithm to perform better.

Deep Learning: A Technique of Machine Learning

Delving Further into Deep Learning

The large sets of data is calculated through a untrained neural network. In the process, the weight of each layer is adjusted to recognize important features of the data. Adjusting the weight within the layers is important to training as it adjusts the neural network to improve the performance of the task it is learning. Since there are many permutation between the layers, this calculation requires high computational power.

Delving Further into Deep Learning

Difference between Training and Inference of Deep Learning Frameworks

Unlike training, inference doesn't re-evaulate or adjust the layers of the neural network based on the results. Inference applies knowledge from a trained neural network model and a uses it to infer a result. So, when a new unknown data set is input through a trained neural network, it outputs a prediction based on predictive accuracy of the neural network. Inference comes after training as it requires a trained neural network model.

Difference between Training and Inference of Deep Learning Frameworks

Important aspects of Inference

While a deep learning system can be used to do inference, the important aspects of inference makes a deep learning system not ideal. Deep learning systems are optimized to handle large amounts of data to process and re-evaulate the neural network. This requires high performance compute which is more energy which means more cost. Inference may be smaller data sets but hyperscaled to many devices.

Important aspects of Inference

Optimizing with TensorRT

TensorRT is Nvidia's deep learning inference platform built on CUDA and synergizes with Nvidia's GPU to enable the most efficient deep learning performance. TensorRT uses FP32 algorithms for performing inference to obtain the highest possible inference accuracy. However, you can use FP16 and INT8 precisions for inference with minimal impact to accuracy of results in many cases. Mixed computations in FP32 and FP16 precision can be used in TensoRT to further improve performance.

Trained models from every deep learning framework can be imported into TensorRT and can be optimized with platform specific kernels to maximize performance on Tesla GPUs in the data center and the Jetson embedded platform.