top of page
Search

The Many Faces of Classification: Inside Qlik Predict’s Algorithm Toolbox

  • Writer: Igor Alcantara
    Igor Alcantara
  • 5 days ago
  • 16 min read
ree

Every dataset has a bit of Dr. Jekyll and Mr. Hyde in it. On the surface, it might look like one thing, but underneath it can take on multiple identities depending on how you examine it. In fact, sometimes it’s not just two identities; it’s ten.


That’s where classification comes in. Unlike regression, my previous article, which predicts how much or how many, classification sorts data into categories, revealing its hidden personas. Sometimes the split is simple and dramatic, like Jekyll versus Hyde: spam or not spam, fraud or safe, churn or stay. Other times, the story gets more crowded, and the dataset must be cast into multiple roles; up to ten in Qlik Predict’s case. Think of it less as a split personality and more as a full ensemble cast, each record playing its part in the drama.


ree

If regression is about predicting “how much” or “how many,” then classification is about answering “which one.” It’s less about numbers on a scale and more about sorting things into categories. Imagine walking into a giant record store in the 90s. You’re not counting vinyls, you’re trying to figure out if that album goes under rock, jazz, indie or the mysterious section labeled alternative.


That, in essence, is classification.


In Qlik Predict, classification sits proudly next to regression and (soon) time series as one of the three great pillars of predictive modeling. But classification itself comes in two main flavors: binary and multiclass. Knowing the difference is crucial before you set your model loose on the data.


This article is the 8th on the series "The Theory Behind Qlik Predict". For the other articles in this series, please refer to the following list:



A Brief Look at Classification


Classification, at its core, is one of humanity’s oldest intellectual activities. Long before computers or machine learning, people were classifying the world around them. Ancient herbalists sorted plants into “healing” and “poisonous.” Greek philosophers like Aristotle created elaborate systems to classify animals by their traits: wings, legs, habitat. Even librarians in Alexandria had to decide where to shelve those scrolls.


Fast forward to the 20th century, and classification took on a statistical flavor. Scientists started asking: if we measure certain characteristics, can we mathematically predict a category? This gave rise to early methods like Fisher’s Linear Discriminant (1936), which classified things like iris flowers into species based on petal and sepal lengths. It was one of the first mathematical models of its kind and still shows up in data science textbooks today.


When computers entered the scene, classification algorithms became more powerful and scalable. The 1980s and 1990s saw decision trees and neural networks taking shape. By the 2000s, ensemble methods like Random Forests and boosting transformed the field, winning Kaggle competitions and powering modern business applications.


In short, classification is the science of sorting things into categories, but also an art. It evolved from Aristotle’s observations, through Fisher’s formulas, all the way to Qlik Predict’s eight-algorithm toolbox. The essence hasn’t changed, we’re still trying to answer the question: “What kind of thing is this?”, but the tools at our disposal have never been sharper.


Binary Classification: Two Doors, One Choice


Binary classification is the simplest and often the most common type of classification. Like politics today, it's polarized. The world is divided into just two buckets.


  • Fraud or not fraud

  • Customer churn or stay

  • Spam or not spam

  • Cat picture or dog picture

  • Patient Readmission or not


It’s like flipping a coin, except your model doesn’t just guess, it learns from patterns in your data to decide which side is more likely.


In business, binary classification often answers questions that have a direct impact on decisions: Should we approve this loan application? Is this transaction suspicious? Will this customer buy again?


Think of binary classification as the Netflix “thumbs up/thumbs down” rating system. The model is constantly learning from all those ratings and then using them to predict what you’ll enjoy next.


Binary is the simplest and, many times, the most accurate type of Machine Learning. There are several cases where data scientists transform a regression into a classification or a multiclass problem into a binary one. As you simplify the problem, the better you can predict it. It is the Occam's Razor applied to data science. Instead of predicting the Total Cholesterol with a regression, you can predict the Cholesterol Level (Low, Normal, High). You get the same insight with less error. Instead of a 5-Star rating (multiclass), you turn into a Like or Dislike (binary). That is what companies like Netflix did when they realized most people rated either 1 or 2 stars when they hate something or 5 when they love it, almost no one rated 3 or 4 stars.


Multiclass Classification: When Two Choices Aren’t Enough


Life isn’t always a coin toss. Sometimes you need more than two categories. That’s when multiclass classification comes into play.


Imagine you’re running a movie recommendation system. A binary model could separate Action vs. Not Action, but that’s not very useful. Multiclass lets you separate things into Comedy, Drama, Sci-Fi, Horror, Romance, and so on.


Or think of classifying flowers: is it a rose, a tulip, or a daisy? What type of wine goes best with this food: white, red, or rose? Binary can’t handle that; multiclass can.


Qlik Predict supports up to 10 distinct categories for multiclass problems. That means you can build models that classify customers into up to 10 segments, or products into up to 10 types. This limit exists for a reason: the more categories you have, the harder the problem becomes. Beyond a certain point, the model spends more time splitting hairs than making useful predictions.


It’s like trying to sort your socks not just by color but also by fabric, season, length, and “vibes.” At some point, the effort outweighs the benefit. If your target variable has up to 10 distinct values, even if those are numeric values, Qlik Predict will select the problem as a classification. Ten or less distinct values are not enough for a regression. If the variable has more than 10 and those values are not numeric, then this variable cannot be predicted. You need to do some homework and group them together, so the end result is within the defined parameters. For that I recommend Qlik Data Flow, but that is a separate topic.


The Algorithms of Classification in Qlik Predict


Now we get to the fun part: the eight algorithms Qlik Predict uses for classification. Each one is like a character in a heroes' movie cast: unique strengths, quirks, and ideal use cases. The magic of Qlik Predict is that it doesn’t force you to pick one. It runs multiple models, compares them, and shows you which one performs best for your problem. Some of these algorithms I already explain in my last article about Regression, but it does not harm to explore them again with a different perspective, a classification one. Let’s meet the cast.


1 - Logistic Regression: Turning Probabilities into Decisions


At first glance, the term “logistic regression” sounds like something you’d only hear in a PhD exam. But don’t let the name scare you. Logistic regression is actually one of the most approachable and widely used classification algorithms. It has been around for decades and still plays a starring role in everything from medical studies to marketing campaigns. I know the name says regression but don't get too attached to names, this one is a household name when it comes to classification.


The Core Idea

Logistic regression doesn’t try to predict a continuous value (like house prices). Instead, it predicts the probability of belonging to a particular class. The result is always between 0 and 1, which makes it perfect for binary classification. So, it predicts a number, that's why we call it regression, but not any number, a probability.


For example:


  • Probability = 0.85 → “There’s an 85% chance this customer will churn.”

  • Probability = 0.10 → “Only a 10% chance this email is spam.”


From there, you just set a threshold (commonly 0.5). If the probability is above 0.5, the model says “Yes.” If it’s below, it says “No.” That’s how a probability turns into a crisp decision.


Why the “Logistic”?

The magic happens thanks to something called the logistic function (also known as the sigmoid curve). Imagine an S-shaped curve that squashes any number, no matter how big or small, into a range between 0 and 1.


  • Very negative inputs get mapped close to 0

  • Very positive inputs get mapped close to 1

  • Inputs near zero hover around 0.5


This curve is what allows logistic regression to translate messy, unbounded data into neat probabilities. The next image illustrates this.


ree

A Practical Example

Let’s say you’re predicting whether a loan will default. Your features might be:


  • Income

  • Credit score

  • Number of missed payments


Logistic regression assigns each feature a weight. For example, “missed payments” might get a strong negative weight (the more missed payments, the higher the probability of default). It then adds these weights together, feeds the result into the logistic function, and produces a probability.


The beauty here is interpretability. You can look at the coefficients and actually explain what’s going on. Unlike some “black box” models, logistic regression is transparent: you see how much each feature influences the outcome.


Why It’s Still Relevant

In the age of XGBoost and neural networks, you might think logistic regression is outdated. But it remains a workhorse for three big reasons:


  1. Simplicity – Easy to implement and understand.

  2. Speed – Trains quickly, even on large datasets.

  3. Interpretability – You can explain the results to executives without needing a whiteboard full of math.


It’s often the first model you try in any classification problem. If it works well enough, you might not even need the fancier stuff.


Where It Shines
  • Credit scoring

  • Customer churn prediction

  • Medical diagnosis (e.g., disease present or not)

  • Marketing response modeling


Logistic regression is the dependable accountant of the algorithm toolbox: always there, always reliable, and surprisingly good at telling you not just what the outcome is, but why.


2 - Random Forest Classification: The Committee of Trees


Random Forest builds not one, but hundreds of decision trees. Each tree sees a slightly different version of the data and casts its vote. The final prediction comes from the majority.

It’s like consulting 100 doctors and going with the diagnosis most of them agree on. One doctor might be wrong, but when you pool the wisdom of many, the result is usually reliable.

Random Forest is robust, works well with messy data, and rarely overfits. The tradeoff is that it’s less interpretable than logistic regression, good luck explaining why 500 trees all voted one way. That is more to Random Forest and algorithms based on decision trees, but I explored this topic well enough in my previous article, you can go back there for more details.


3 - XGBoost Classification: The Perfectionist Coach


XGBoost stands for “Extreme Gradient Boosting,” and it’s one of the rockstars of machine learning competitions. This is another one that I explain in more details in my article about Regression but let me revisit the topic in a couple paragraphs.


Here’s how it works: instead of building all trees at once (like Random Forest), XGBoost builds them one by one. Each new tree focuses on fixing the mistakes of the previous ones. It’s like a coach reviewing last game’s tape, pointing out exactly what went wrong, and drilling players until they get it right.


This process of learning from mistakes makes XGBoost incredibly accurate, though also a bit resource-hungry. When you want top performance and don’t mind the computational cost, XGBoost is a strong choice.


4 - Lasso Regression: The Minimalist with a Sharp Pair of Scissors


Lasso Regression is the minimalist who walks into a cluttered room, looks around, and says: “Half of this has to go.” It’s not interested in keeping everything, only what really matters. It is the Michaelangelo of algorithms: it removes from the marble everything that is not David, not a single unnecessary piece of rock.


The Core Idea

Lasso Regression is a form of regularization that applies an L1 penalty to the model’s coefficients. In plain terms: it punishes the model for giving too much importance to too many features. If you remember my article about the Curse of Dimensionality, you understand that too many columns might be a problem.


Here’s what makes it special: Lasso doesn’t just shrink coefficients. It can shrink them all the way down to zero. That means entire features get effectively removed from the model. So, while Ridge regression (which we do not cover here) politely whispers, “Maybe spread your attention more evenly,” Lasso is blunt: “You’re not useful. Goodbye.” Not very subtle, but efficient.


This makes Lasso an algorithm that doubles as both a classifier and a feature selector.


Lasso Regression shrinks less important feature coefficients to zero (in dark gray), leaving only the most predictive ones (in blue)
Lasso Regression shrinks less important feature coefficients to zero (in dark gray), leaving only the most predictive ones (in blue)

Why “Feature Selection” Matters

In the real world, datasets can be noisy. You might have dozens or even hundreds of variables, and not all of them are relevant. Some might even hurt your model’s performance if you include them.


Lasso helps by automatically pruning away the irrelevant ones. Instead of manually guessing which features to drop, you let the math decide.


For example:

  • Predicting house prices with 200 variables (lot size, number of bedrooms, distance to Starbucks, color of the front door…). Lasso might figure out that “color of the front door” has no predictive power and cut it out entirely.

  • In healthcare, predicting disease outcomes from thousands of genetic markers, Lasso helps focus on the handful that truly matter.


A Practical Example

Imagine you’re building a model to classify whether a customer will buy a product. You feed in 100 possible predictors: age, location, last purchase date, favorite TV show, zip code, prefer style of mustache, whether they own a cat, and so on.


  • Logistic regression would assign weights to all 100 predictors, even if many are irrelevant.

  • Ridge regression would keep them all too, just with smaller weights.

  • Lasso regression, however, would give a weight of zero to, say, “favorite TV show” and “owns a cat,” leaving only the truly predictive features.


In the end, you have a cleaner, leaner model that’s easier to interpret and often performs better.


Where Lasso Shines
  • High-dimensional data: When the number of features is larger than the number of observations (like genetics or text classification).

  • Interpretability: By pruning away noise, Lasso creates simpler models you can actually explain to stakeholders.

  • Exploration: When you don’t know which features matter most, Lasso helps discover them.


The Tradeoffs

Lasso is not perfect. If you have highly correlated features (say, “income” and “education level”), Lasso may arbitrarily choose one and discard the other. This can lead to unstable results if the dataset changes slightly. That’s exactly why Elastic Net was invented: to smooth out Lasso’s rough edges.


Why It Matters in Qlik Predict

In Qlik Predict, Lasso is the algorithm you call on when you want clarity. It’s particularly valuable for businesses swimming in data but unsure which signals matter. By automatically cutting away the noise, it helps companies focus on the handful of features driving outcomes.

Think of it as Qlik Predict’s Marie Kondo of machine learning: it politely thanks irrelevant features for their service, then sends them away to make room for what truly sparks predictive joy.



5 - Elastic Net: The Diplomat of Regularization


Elastic Net sits at the crossroads of two other techniques: Lasso regression, which we just covered, and Ridge regression. To appreciate what makes Elastic Net special, let’s take a step back.


Elastic Net reduces the influence of less important features without eliminating them, striking a balance between Lasso and Ridge.
Elastic Net reduces the influence of less important features without eliminating them, striking a balance between Lasso and Ridge.

Why Regularization Exists

When building models, there’s always the danger of overfitting. That’s when your algorithm memorizes every wrinkle of the training data, including noise, rather than learning general rules. Imagine a student who memorizes the answer key instead of learning the material: they’ll ace the practice test but bomb the real exam.


Regularization is a way to discipline the model. It adds a “penalty” for complexity, pushing the model to stay simple and generalize better. Two of the most famous penalties are:


  • Ridge regression (L2 penalty): discourages large coefficients by shrinking them closer to zero, but never all the way. It spreads the influence across many features.

  • Lasso regression (L1 penalty): goes further by pushing some coefficients all the way down to zero. This effectively removes features, acting as automatic feature selection.


Both are useful, but each has blind spots. Elastic Net says: why not both? It combines the L1 penalty from Lasso with the L2 penalty from Ridge. You get the best of both worlds:


  • Like Lasso, it can eliminate irrelevant features by shrinking their weights to zero.

  • Like Ridge, it handles groups of correlated features more gracefully, keeping balance instead of randomly picking one and discarding the others.


The blend between Lasso and Ridge is controlled by a parameter called alpha (or mixing ratio). Slide it toward one end, and Elastic Net behaves more like Lasso; slide it the other way, and it behaves more like Ridge. Lasso is Liam Gallagher; Ridge is his brother Noel. Elastic Net is the army of psychotherapists who could make the Oasis tour together again after so many years.


Think of Elastic Net as a diplomat brokering peace between two stubborn parties. It’s flexible, adaptive, and tends to outperform its parents when the dataset has many correlated features or when there are more predictors than observations.


A Practical Example

Suppose you’re predicting customer churn with 200 potential predictors: demographics, purchase history, web clicks, email responses, and so on. Many of these features overlap; for instance, “time on site” and “pages viewed” are strongly correlated.


  • If you used Lasso, it might arbitrarily keep one and throw out the other, even though both contain useful information.

  • If you used Ridge, it would keep both but might give small weights that make the model harder to interpret.

  • With Elastic Net, you can preserve groups of correlated features while still zeroing out irrelevant ones, giving you a balanced, interpretable model.


Why Elastic Net Matters in Qlik Predict

Elastic Net’s power is its adaptability. Real-world datasets are rarely neat and independent. Variables overlap, interact, and sometimes drown each other out. Elastic Net thrives in this messy reality.


It’s not as lightning fast as Logistic Regression, nor as brute-force powerful as XGBoost, but it’s a reliable middle ground when you want:


  • Automatic feature selection without losing important correlated variables.

  • Better performance on high-dimensional data (lots of columns, fewer rows).

  • More stability than Lasso and more sparsity than Ridge.


Where It Shines
  • Customer segmentation with many overlapping features.

  • Genomics and bioinformatics, where thousands of correlated genetic markers are analyzed.

  • Text classification, where word frequencies often overlap in meaning.

  • Marketing analytics, where customer behaviors (clicks, purchases, opens) tend to cluster.


Elastic Net is the peacemaker of Qlik Predict’s toolbox. Not too extreme in either direction, it balances complexity and simplicity, giving you a model that’s interpretable, stable, and effective in the wild, like a Champagne Supernova.


6 - LightGBM Classification: The Speed Demon


LightGBM (short for Light Gradient Boosting Machine) is designed for speed and efficiency. It can handle massive datasets and train models much faster than XGBoost, without sacrificing much accuracy. Just like some other algorithms here, I described this one in the Regression article. Regardless, let's briefly recap.


Instead of growing trees level by level, it grows them leaf by leaf, focusing only where the improvements matter most. That’s like a gardener pruning only the branches that will help the tree grow stronger, instead of trimming every single twig. When time is critical, LightGBM is the go-to.


7 - CatBoost Classification: The Cat Whisperer


Guess what? Yes, this one I also covered in the regression article. The name comes from “categorical boosting,” not cats, but the metaphor works: it’s smooth, elegant, and very good at handling categorical data without endless preprocessing.


Most algorithms stumble when dealing with raw categories like “country” or “product type.” CatBoost, however, handles them natively. It’s also less prone to overfitting, which makes it reliable even when the dataset isn’t huge. Think of CatBoost as the sophisticated musican who makes everything look effortless.


8 - Gaussian Naive Bayes: The Optimist Who Assumes the Best


If Logistic Regression is the accountant and Lasso is the minimalist, Naive Bayes is the cheerful optimist in Qlik Predict’s algorithm toolbox. It makes very strong assumptions about the world (like a Bayesian would do); assumptions that are often wrong, yet somehow still manages to perform surprisingly well in practice.


The Core Idea

Naive Bayes is built on Bayes’ Theorem, a rule of probability that dates back to an 18th-century minister and mathematician named Thomas Bayes. Read my article about it here. The theorem tells us how to update our beliefs when new evidence comes in.


In the context of classification, it works like this:


  • Start with the overall probability of each class (called the prior).

  • Look at the evidence (the features of the data point).

  • Combine the evidence with the priors to calculate the posterior probability of belonging to each class.

  • Pick the class with the highest posterior probability.


Why It’s “Naive”

The naive part comes from its assumption: all features are independent of each other given the class.


  • If you’re classifying emails, it assumes the presence of the word “Viagra” is completely independent of the presence of the word “cheap.”

  • In reality, these words clearly go together. But Naive Bayes doesn’t care, it charges ahead anyway.


And here’s the funny part: it often works well despite this unrealistic assumption.


Gaussian Naive Bayes models each class with a normal distribution per feature and chooses the class with the higher likelihood at a given feature value.
Gaussian Naive Bayes models each class with a normal distribution per feature and chooses the class with the higher likelihood at a given feature value.

The Gaussian Flavor

There are several versions of Naive Bayes, depending on how you assume the features are distributed:


  • Multinomial Naive Bayes: works well with word counts in text classification.

  • Bernoulli Naive Bayes: deals with binary features (yes/no).

  • Gaussian Naive Bayes: assumes features are continuous and follow a normal (Gaussian) distribution.


Gaussian Naive Bayes is the version Qlik Predict uses. For example, if you’re classifying whether a customer is “likely to churn” or “not likely to churn,” it might assume features like “monthly spend” or “time since last login” follow a bell-shaped curve for each class. This assumption makes the math neat and simple; just calculate means and variances for each feature within each class.


A Practical Example

Suppose you want to classify whether a patient has a particular disease. The features are continuous measurements like blood pressure, cholesterol level, and glucose.


Gaussian Naive Bayes would:


  1. For each feature and each class (disease or no disease), calculate the mean and variance.

  2. Use the Gaussian probability density function to compute the likelihood of observing those feature values.

  3. Apply Bayes’ theorem to combine the likelihoods into probabilities for each class.

  4. Pick the class with the higher probability.


All of this happens with just a few lines of math and very little computing power.


Strengths
  • Simplicity: Easy to implement, train, and understand.

  • Speed: Trains almost instantly, even on large datasets.

  • Text classification: Incredibly effective for spam detection, sentiment analysis, and other tasks where features are word frequencies.

  • Small data: Works surprisingly well with limited training data.


Weaknesses
  • The independence assumption rarely holds true in real data. If features are strongly correlated, performance may drop.

  • Gaussian Naive Bayes struggles when features don’t actually follow a normal distribution. For example, skewed or highly categorical data can trip it up.


Why It Matters in Qlik Predict

In Qlik Predict, Gaussian Naive Bayes is like the quick starter: it gives you fast, baseline models that are often good enough, especially when your features are continuous and reasonably bell-shaped. It’s not the fanciest or the flashiest, but it can punch well above its weight.


Think of it as the cheerful optimist in a group project. Sure, it assumes everyone will get along and do their part independently, which isn’t true. But somehow, the project still gets finished; and sometimes, it’s even one of the best in the class.


Scenario

Likely Champion

Need interpretability and simple yes/no probabilities

Logistic Regression

Data is messy, nonlinear, and you want stability

Random Forest

Seeking top accuracy with moderate computation time

XGBoost

Lots of irrelevant features, want automatic pruning

Lasso Regression

Many correlated features, need balance

Elastic Net

Features are continuous, data is small or text-like

Gaussian Naive Bayes

Massive dataset, training speed is critical

LightGBM

Dataset is full of categorical variables

CatBoost

Conclusion


Classification is one of the most powerful ways to turn raw data into decisions. Whether you’re answering a yes/no question with binary classification or juggling up to ten categories with multiclass, Qlik Predict equips you with a full cast of algorithms to handle the job. Each one brings its own philosophy: Logistic Regression gives you probabilities you can explain, Random Forest thrives on ensemble wisdom, XGBoost and LightGBM drive accuracy and speed, Lasso and Elastic Net keep models clean, Naive Bayes delivers fast and surprisingly effective results, and CatBoost takes categorical data in stride.


The real advantage is that you don’t have to choose blindly. Qlik Predict runs these contenders, measures their performance, and shows you the champion for your scenario. That means less time agonizing over algorithms and more time applying insights to real problems.


At the end of the day, classification isn’t just about sorting data, it’s about giving shape to uncertainty. It’s about unmasking the different sides your data can take; sometimes a simple duality, like Dr. Jekyll and Hyde, sometimes a whole cast of characters. And with Qlik Predict’s algorithm toolbox at your disposal, you’ve got both the range and the reliability to turn those many faces of uncertainty into clear, actionable answers.

 
 
 

© 2024 Data Voyagers

  • Youtube
  • LinkedIn
bottom of page