Title: | Customer Intelligence Tool for Rapid Understandable Segmentation |
---|---|
Description: | A tool to easily run and visualise supervised and unsupervised state of the art customer segmentation. It is built like a pipeline covering the 3 main steps in a segmentation project: pre-processing, modelling, and plotting. Users can either run the pipeline as a whole, or choose to run any one of the three individual steps. It is equipped with a supervised option (tree optimisation) and an unsupervised option (k-clustering) as default models. |
Authors: | Dom Clarke [aut, cre], Cinzia Braglia [aut], Oskar Nummedal [aut], Leo McCarthy [aut], Rebekah Yates [aut], Stuart Davie [aut], Joash Alonso [aut], PEAK AI LIMITED [cph] |
Maintainer: | Dom Clarke <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.2 |
Built: | 2024-11-01 11:15:52 UTC |
Source: | https://github.com/cran/citrus |
Creates pair plot from data table
citrus_pair_plot(model, vars = NULL)
citrus_pair_plot(model, vars = NULL)
model |
list, a citrus segmentation model |
vars |
data.frame, the data to segment |
GGally object displaying the segment feature pair plots.
k-clusters method for segmentation. It can handle segmentation for both numerical data types only, by using k-means algorithm, and mixed data types (numerical and categorical) by using k-prototypes algorithm
k_clusters(data, hyperparameters, verbose = TRUE)
k_clusters(data, hyperparameters, verbose = TRUE)
data |
data.frame, the data to segment |
hyperparameters |
list of hyperparameters to pass. They include
centers: number of clusters or a set of initial (distinct) cluster centers, or 'auto'. When 'auto' is chosen, the number of clusters is optimised; |
verbose |
logical whether information about the clustering procedure should be given. |
A class called "k-clusters" containing a list of the model definition, the hyper-parameters, a table of outliers, the elbow plot (ggplot object) used to determine the optimal no. of clusters, and a lookup table containing segment predictions for customers.
Saves the model and its settings so that it can be recreated
model_management(model, hyperparameters)
model_management(model, hyperparameters)
model |
data.frame, the model to save |
hyperparameters |
list, list of hyperparameters of the model |
No return value. Called to save model and settings locally.
Generates the output table for model and data
output_table(data, model)
output_table(data, model)
data |
A dataframe generated from the pre-processing step |
model |
A model object used to classify ids with, generated from the model selection layer |
A tibble providing high-level segment attributes such as mean and max (numeric) or mode (categorical) for the segmentation features used.
Transforms a transactional table into an id aggregated table with custom options for aggregation methods for numeric and categorical columns.
preprocess( df, samplesize = NA, numeric_operation_list = c("mean"), categories = NULL, target = NA, target_agg = "mean", verbose = TRUE )
preprocess( df, samplesize = NA, numeric_operation_list = c("mean"), categories = NULL, target = NA, target_agg = "mean", verbose = TRUE )
df |
data.frame, the data to preprocess |
samplesize |
numeric, the fraction of ids used to create a sub-sample of the input df |
numeric_operation_list |
list, a list of the aggregation functions to apply to numeric columns |
categories |
list, a list of the categorical columns to aggregate |
target |
character, the column to use as a response variable for supervised learning |
target_agg |
character, the aggregation function to use to aggregate the target column |
verbose |
logical whether information about the preprocessing should be given |
An id attributes data frame, e.g. customer attributes if the id represents customer IDs. A single row per unique id.
A sample customer dataset for the purpose of demonstrating the segmentation algorithm.
data(preprocessed_data)
data(preprocessed_data)
Data frame on a customer level. Contains 402 rows and 8 columns.
data(preprocessed_data)
data(preprocessed_data)
THIS HAS BEEN COPIED FROM THE ARCHIVED rpart.utils PACKAGE AND THIS CODE WAS WRITTEN BY THE AUTHORS OF THAT PACKAGE Creates lists of variable values (factor levels) associated with each rule in an rpart object.
rpart.lists(object)
rpart.lists(object)
object |
an rpart object |
a list of lists
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.lists(fit)
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.lists(fit)
Plot an rpart model and prettifies it. Wrap around the rpart.plot::prp function
rpart.plot_pretty( model, main = "", sub, caption, palettes, type = 2, fontfamily = "sans", ... )
rpart.plot_pretty( model, main = "", sub, caption, palettes, type = 2, fontfamily = "sans", ... )
model |
an rpart model object |
main |
main title |
sub |
fixing captions in line |
caption |
character, caption to use in the plot |
palettes |
list, list of colours to use in the plot |
type |
type of plot. Default is 2. Possible values are: 0 Default. Draw a split label at each split and a node label at each leaf. 1 Label all nodes, not just leaves. 2 Like 1 but draw the split labels below the node labels. 3 Draw separate split labels for the left and right directions. 4 Like 3 but label all nodes, not just leaves. 5 Show the split variable name in the interior nodes. |
fontfamily |
Names of the font family to use for the text in the plots. |
... |
Additional arguments. |
An rpart.plot object. This plot object can be plotted using the rpart::prp function.
THIS HAS BEEN COPIED FROM THE ARCHIVED rpart.utils PACKAGE AND THIS CODE WAS WRITTEN BY THE AUTHORS OF THAT PACKAGE Returns a list of strings summarizing the branch path to each node.
rpart.rules(object)
rpart.rules(object)
object |
an rpart object |
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.rules(fit)
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.rules(fit)
THIS HAS BEEN COPIED FROM THE ARCHIVED rpart.utils PACKAGE AND THIS CODE WAS WRITTEN BY THE AUTHORS OF THAT PACKAGE Returns an unpivoted table of branch paths (subrules) associated with each node.
rpart.rules.table(object)
rpart.rules.table(object)
object |
an rpart object |
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.rules.table(fit)
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.rules.table(fit)
THIS HAS BEEN COPIED FROM THE ARCHIVED rpart.utils PACKAGE AND THIS CODE WAS WRITTEN BY THE AUTHORS OF THAT PACKAGE Returns an unpivoted table of variable values (factor levels) associated with each branch.
rpart.subrules.table(object)
rpart.subrules.table(object)
object |
an rpart object |
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.subrules.table(fit)
library(rpart) fit<-rpart(Reliability~.,data=car.test.frame) rpart.subrules.table(fit)
Segments the data by running all steps in the segmentation pipeline, including output table
segment( data, modeltype = c("tree", "k-clusters"), FUN = NULL, FUN_preprocess = NULL, steps = c("preprocess", "model"), prettify = FALSE, print_plot = FALSE, hyperparameters = NULL, force = FALSE, verbose = FALSE )
segment( data, modeltype = c("tree", "k-clusters"), FUN = NULL, FUN_preprocess = NULL, steps = c("preprocess", "model"), prettify = FALSE, print_plot = FALSE, hyperparameters = NULL, force = FALSE, verbose = FALSE )
data |
data.frame, the data to segment |
modeltype |
character, the type of model to use to segment choices are: 'tree', 'k-clusters' |
FUN |
function, A user specified function to segment, if the standard methods are not wanting to be used |
FUN_preprocess |
function, A user specified function to preprocess, if the standard methods are not wanting to be used |
steps |
list, names of the steps the user want to run the data on. Options are 'preprocess' and 'model' |
prettify |
logical, TRUE if want cleaned up outputs, FALSE for raw output |
print_plot |
logical, TRUE if want to print the plot |
hyperparameters |
list of hyperparameters to use in the model. |
force |
logical, TRUE to ignore errors in validation step and force model execution. |
verbose |
logical whether information about the segmentation pipeline should be given. |
A list of three objects. A tibble providing high-level segment attributes, a lookup table (data frame) with the id and predicted segment number, and an rpart object defining the model.
A sample customer dataset for the purpose of demonstrating the segmentation algorithm.
data(transactional_data)
data(transactional_data)
Data frame on a transactional level. Contains 10,000 rows and 6 columns.
data(transactional_data)
data(transactional_data)
Organises the model outputs, predictions and settings in a general structure
tree_abstract(model)
tree_abstract(model)
model |
The model to organise |
A structure with the class name "tree_model" which contains a list of all the relevant model data, including the rpart model object, hyper-parameters, segment table and the labelled customer lookup table.
Runs decision tree optimisation on the data to segment ids.
tree_segment(data, hyperparameters, verbose = TRUE)
tree_segment(data, hyperparameters, verbose = TRUE)
data |
data.frame, the data to segment |
hyperparameters |
list, list of hyperparameters to pass. They include segmentation_variables: a vector or list with variable names that will be used as segmentation variables; dependent_variable: a string with the name of the dependent variable that is used in the clustering; min_segmentation_fraction: integer, the minimum segment size as a proportion of the total data set; number_of_segments: integer, number of leaves you want the decision tree to have. |
verbose |
logical whether information about the segmentation procedure should be given. |
List of 4 objects. The rpart object defining the model, a data frame providing high-level segment attributes, a lookup table (data frame) with the id and predicted segment number, and a list of the hyperparameters used.
Stuart Davie, [email protected]
Returns a prettier version of the decision tree.
tree_segment_prettify(tree, char_length = 20, print_plot = FALSE)
tree_segment_prettify(tree, char_length = 20, print_plot = FALSE)
tree |
The decision tree model to prettify |
char_length |
integer, the character limit before truncating categories and putting them into an "other" group |
print_plot |
logical, indicates whether to print the generated plot or not |
A formatted and "prettified" rpart.plot object. This plot object can be plotted using the rpart::prp function.
Validates that the input data adheres to the expected format for modelling.
validate(df, supervised = TRUE, force, hyperparameters)
validate(df, supervised = TRUE, force, hyperparameters)
df |
data.frame, the data to validate |
supervised |
logical, TRUE for supervised learning, FALSE for k-clusters |
force |
logical, TRUE to ignore error on categorical columns |
hyperparameters |
list of hyperparameters used in the model |
'TRUE' if all checks are passed. Otherwise an error is raised.