Comprehensive Analysis of Retail Sales Data Part 4 - Interactive Dashboards and DAX Optimization in Power BI
In this page, I’ll walk you through our data model in Power BI and explain the DAX code and formulas used to create calculated tables, columns, and measures. We start with the Model Overview, where I’ll explain how different tables are connected. Next, we dive into Calculated Tables: RFM and Events.
Then, we look at Calculated Columns, which let us create new data within the dataset. These columns are useful for visualizations and adding extra layers to the analysis, especially when dealing with static information. Finally, we cover Measures—dynamic calculations that respond to user inputs. Unlike calculated columns, measures are recalculated in real-time based on what the user selects, making them essential for interactive, complex analysis.
Table of Contents:
Before diving into the step-by-step details, lets understand the overall model view. The data model consists of several interrelated tables, each serving a distinct purpose in the analysis. These tables are connected through one-to-many relationships, meaning that each record in the parent table can be associated with multiple records in the child table. For instance, the InvoiceMetrics table contains a single record for each unique InvoiceNo, whereas the Transactions table may include multiple records associated with the same InvoiceNo.
Transactions
InvoiceMetrics
RFM
Date
Holidays
Events
InvoiceNo
CustomerID
InvoiceDate
Date
YearMonth
In this section, I will describe the creation of two key calculated tables, RFM and Events, which are crucial for enhancing the analysis. The RFM table is designed to support RFM Segmentation Analysis, categorizing customers based on their purchasing behavior. The Events table helps track the impact of monthly events on sales, allowing me to incorporate external factors into the analysis.
Table 1: RFM
-
Recency: Measures the number of days since a customer’s most recent purchase, providing insight into how recently a customer has engaged with the business.
-
Frequency: Counts the number of distinct invoices per customer, indicating how often a customer makes a purchase.
-
Monetary: Calculates the total revenue generated from each customer, representing the customer’s monetary value to the business.
Table 2: Events
-
YearMonth: A unique combination of year and month derived from the Date table (e.g., 2011-05 for May 2011). It serves as the key to group the events by year and month, allowing us to associate the holidays/events with specific months.
-
Events: Concatenates the names of events or holidays for a given YearMonth. If no events are found, the value "None" is assigned. This column provides a summary of the events for each month, making it possible to understand the impact of specific holidays or events on sales within that period.
In this section, I will explain the creation of calculated columns, which allow us to generate new data based on existing columns, creating additional fields that can be used in visualizations, filtering, or further calculations. These columns are evaluated row by row, and unlike measures, their values are stored in the data model, making them suitable for situations where you need to work with static data or incorporate logic directly within the dataset.
Keep in mind that the categorizations and segmentations used in these columns should be decided in collaboration with stakeholders to ensure they align with business objectives and strategic goals.
RFM Table
-
R Score: A score to categorize customers based on the recency of their last purchase.
Logic:
-
0 to 30 days: Score = 5
-
31 to 90 days: Score = 4
-
91 to 180 days: Score = 3
-
181 to 300 days: Score = 2
-
300+ days: Score = 1
-
-
F Score: A score to categorize customers based on the frequency of their purchases.
Logic:
-
1 purchase: Score = 1
-
2 to 10 purchases: Score = 2
-
11 to 50 purchases: Score = 3
-
51 to 150 purchases: Score = 4
-
150+ purchases: Score = 5
-
-
M Score: A score to categorize customers based on the total monetary value (revenue) they have generated.
Logic:
-
Top 10% of customers: Score = 5
-
Next 25% of customers: Score = 4
-
Next 30% of customers: Score = 3
-
Next 25% of customers: Score = 2
-
Bottom 10% of customers: Score = 1
-
-
RFM Score: A composite score combining the R Score, F Score, and M Score to segment customers into different groups.
-
Customer Segment: Classifies customers into segments based on their RFM Score, such as “Loyal”, “At-risk”, or “Champions”. This segmentation is useful for designing targeted marketing strategies.
-
Monetary_Range: A range for the monetary value of customer purchases, rounded to the nearest 100. This is used for distribution visualization.
InvoiceMetrics Table
-
Cohort Month: Identifies the first purchase month for each customer. This column groups customers based on their initial purchase month, which is essential for cohort analysis, retention analysis and understanding customer lifecycle trends.
-
CustomerType: Categorizes customers based on their purchasing behavior over monthly time periods as either "New", "Returning", or "Recovered" (i.e., a customer who returns after a period of inactivity). This classification is crucial for retention analysis and can help in designing targeted marketing strategies.
Date Table
-
YearMonth: Extracts the year and month from the Date column in "YYYY-MM" format. This column is crucial for grouping data by month.
-
SortOrder-WeekDay: Used exclusively for sorting the WeekDay column in the Date table. By default, the WeekDay column is sorted alphabetically, but this column ensures it is sorted in the correct order (Monday to Sunday). The WEEKDAY function is used to assign a numeric value to each day of the week, where Monday is 1 and Sunday is 7.
-
SortOrder-Season: Used exclusively for sorting the Season column in the Date table. By default, the Season column is sorted alphabetically, but this column ensures it is sorted in the correct seasonal order. (Spring to Winter). The SWITCH function is used to assign a numeric value to each season.
In this section, I will explain the creation of measures, which are dynamic calculations used to perform analysis on the data in real time. Unlike calculated columns, measures are evaluated on the fly based on the context of the report or visualization. Measures are essential for aggregating and summarizing data, allowing us to perform complex calculations that respond to filters, slicers, and user selections.
Transactions Table
-
% of Total Customers: Calculates the percentage of total customers based on the current filter context. This measure helps assess the proportion of customers for a specific period or condition relative to the total number of customers in the dataset.
-
Customers: Provides the total number of unique customers based on the current filter context.
-
Revenue: Calculates the total revenue by multiplying the quantity of items sold by their unit price, considering the current filter context.
-
Percentage Revenue: Calculates the percentage of revenue generated by the current data set relative to the total revenue across all records. This helps identify the contribution of specific transactions, periods, or conditions to overall revenue.
-
Sold Items: Counts the distinct number of items (StockCode) sold in transactions, based on the current filter context.
-
Sold Units: Calculates the total number of units sold across all transactions, based on the current filter context.
-
Transactions: Returns the distinct count of transactions (InvoiceNo) based on the current filter context.
InvoiceMetrics Table
-
Average Ticket Size: Calculates the average value of each purchase, based on the current filter context. This measure provides insight into the typical spend per transaction, which is useful for understanding customer behavior and sales trends.
-
Retained Customers: Counts the number of retained customers, defined as those categorized as "Returning" or "Recovered", based on the current filter context. This measure is essential for assessing customer loyalty and the effectiveness of retention strategies.
-
Total Customers: Counts the total number of distinct customers based on the current filter context.
-
Retention Rate: Represents the proportion of customers retained over time, calculated by dividing the number of retained customers by the total number of customers from prior periods. It’s a key metric for analyzing customer loyalty and retention trends.
-
Churn Rate: Represents the proportion of customers who did not remain active. It’s calculated by subtracting the retention rate from 1. If the retention rate is blank, the churn rate will also return a blank value. This measure provides insight into customer loss, which is crucial for evaluating the effectiveness of retention efforts.