Machine Learning (ML) refers to the field and practice of using algorithms that are able to “learn” by extracting patterns from a large body of data. This contrasts to traditional rule-based algorithms. The process of building a machine learning model is, by nature, an iterative approach to problem solving. ML has an adaptive approach that looks over a large body of all possible outcomes and chooses the result that best satisfies its objective function.
Though different forms of ML have existed for years, recent advancements in technology provide the underlying capabilities that have enabled ML to become as promising as it is today. Increased computing capacity (especially elastic computing infrastructure in the cloud), large-scale labelled data sets, and widely distributed open-source ML software frameworks and codes propelled the development of ML models. With these advancements, the accuracy of ML prediction and the number of problems ML can address have dramatically increased in the past decade.
There are three high-level categories of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Each has its own mathematical backbone, and each has its own unique areas of application. Occasionally in more complex workflows, they may be combined.
Supervised learning, also known as supervised machine learning, is defined by its use of labeled datasets to train algorithms to classify data or predict outcomes accurately.
- Input data is fed into the model.
- Weights are adjusted until the model has been appropriately fitted, i.e. generalizes and adequately represents the pattern.
- A training dataset is used to teach models to yield the desired output and includes inputs and outputs that are correctly categorized or “labeled”, which allow the model to learn over time. The algorithm measures its accuracy through the loss function, adjusting until the error has been sufficiently minimized.
Supervised learning models can be used to build and advance a number of important applications, such as:
- Image and object recognition are applied computer vision techniques that are used to detect instances of objects of a certain type of classification such as a car or pedestrian. For example, in health care an AI system can learn to recognize what are pre-cancerous cells and what are not, in order to assist medical professionals conduct an earlier diagnosis relative to what a medical professional could determine on their own.
- Predictive analytics is used to provide deep insights into various data points and allows for the anticipation of results based on given output variables. Examples of predictive analytics include credit scoring to predict likelihood of paying on time based on factors including customer data and credit history.
- Customer sentiment analysis is used to extract and classify important pieces of information from large volumes of data—including context, emotion, and intent. It can be useful for gaining an understanding of customer interactions and can be used to improve customer experience.
- Spam detection is used to train databases to recognize patterns or anomalies in new data to organize spam and non-spam-related emails effectively. As the name suggests, it can be used to detect spam and help create a better user experience and reduce cyber fraud and abuse.
Unsupervised learning is often used in data exploration before a learning goal is established. Unsupervised machine learning uses unlabeled data. From that data, it discovers patterns that help solve clustering or association problems. It’s useful when subject matter experts are unsure of common properties of a data set. Unsupervised learning models are utilized for three main tasks—clustering, association, and dimensionality reduction. Clustering is a data mining technique which groups unlabeled data based on their similarities or differences. Association is used to discover interesting relationships between variables in a dataset. Dimensionality reduction is used to reduce the number of dimensions while still maintaining meaningful properties close to the original data.
Machine learning techniques have become a common method to improve a user experience. Unsupervised learning provides an exploratory path to analyze data to identify patterns in large volumes more quickly when compared to manual observation to determine clusters or associations.
Some of the most common real-world applications of unsupervised learning are:
- News feeds: used to categorize or “cluster” articles on the same story from various online news outlets.
- Computer vision: used for visual perception tasks such as object recognition.
- Medical imaging: used in radiology and pathology to diagnose patients quickly and accurately.
- Anomaly detection: used for going through large amounts of data and discovering atypical data points within a dataset.
- Customer personas: used to understand common traits and to build better buyer persona profiles.
- Recommendation engines: uses past behavior data to discover data trends that can be used to develop tailor such recommendations.
Reinforcement learning is a behavioral machine learning model that is similar to supervised learning, but the algorithm isn’t trained using sample data. This model learns as it goes by using trial and error. A sequence of successful outcomes will be reinforced to develop the best recommendation for a given problem.
Applications using reinforcement learning:
- Autonomous vehicles: used for self-driving cars, reinforcement learning improves safety and performance
- Industry Automations: used to control HVAC systems in buildings, data centers and various industrial centers, which leads to increased energy savings.
- Trading and Finance: time series models can be used for predicting future sales as well as predicting stock prices
- Language and text: used for text summarization, question and answering, and language translation using natural language processing
- Healthcare: used to find optimal policies and procedures using previous experiences of patient care without the need for previous information.
Key Messages
- Supervised learning uses labeled datasets to train algorithms to classify data or predict outcomes.
- Unsupervised learning uses unlabeled data. From that data, it discovers patterns that help solve clustering or association problems.
- Reinforcement learning sequence of successful outcomes will be reinforced to develop the best recommendation for a given problem.
- AI solutions use one, or in some cases several, of these ML techniques.