10 Machine Learning Methods that Every Data Scientist Should Know

Jump-start your data science skills

Jorge Castañón
Towards Data Science
·
Published in
·
15 min readMay 1, 2019

--

Photo by chuttersnap on Unsplash

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core concepts, let’s look at ten different methods, including simple descriptions, visualizations, and examples for each one.

A machine learning algorithm, also called model, is a mathematical expression that represents data in the context of a ­­­problem, often a business problem. The aim is to go from data to insight. For example, if an online retailer wants to anticipate sales for the next quarter, they might use a machine learning algorithm that predicts those sales based on past sales and other relevant data. Similarly, a windmill manufacturer might visually monitor important equipment and feed the video data through algorithms trained to identify dangerous cracks.

The ten methods described offer an overview — and a foundation you can build on as you hone your machine learning knowledge and skill:

  1. Regression
  2. Classification
  3. Clustering
  4. Dimensionality Reduction
  5. Ensemble Methods
  6. Neural Nets and Deep Learning
  7. Transfer Learning
  8. Reinforcement Learning
  9. Natural Language Processing
  10. Word Embeddings

One last thing before we jump in. Let’s distinguish between two general categories of machine learning: supervised and unsupervised. We apply supervised ML techniques when we have a piece of data that we want to predict or explain. We do so by using previous data of inputs and outputs to predict an output based on a new input. For example, you could use supervised ML techniques to help a service business that wants to predict the number of new users who will sign up for the service next month. By contrast, unsupervised ML looks at ways to relate and group data points without the use of a target variable to predict. In other words, it evaluates data in terms of traits and uses the traits to form clusters of items that are similar to one another. For example, you could use unsupervised learning techniques to help a retailer that wants to segment products with similar characteristics — without having to specify in advance which characteristics to use.