Using Commercial Data in Data Mining

In this post I’ll describe some of the data that is used in commercial data mining. This data may be aggregated to some level or kept at the individual level, depending on the aim of the study and the methods available.

Often commercial transactions are of particular interest, especially in studies on consumption. In these cases we would ask the following:

  • Where?
    Internet, place of business or geographical locations where the transactions occurred

  • When?
    Frequency and how recent are the transactions

  • How?
    Method of payment

  • How much?
    Quantity and value of transactions

  • What?
    Products or services purchased

One method of examining this is by recency, frequency and monetary value or RFM. This is often represented as cross-tabulated data; recency by period of last purchase, frequency by count of purchases (binned) in that period and finally monetary value of purchases in the period (binned), something like this.


Other data that we may use includes things like reason for cancellation or return, product shelf-life or expiry date, discounts and gross margin on the transaction. This is only a partial list that can, and frequently does, get quite long and detailed. To give you some perspective, consider this; all the information that large-scale retail companies might know about you. Now correlate that data with public data. Such as the demographics for your neighborhood (average household income, cost of homes, mortgage values and interest rates. Besides being kind of creepy when you think about it like that, there’s a lot of value in that information.

There are a few more types of data I’ll mention briefly.

Relational – responses to marketing campaigns and questionnaires

Attitudinal – brand loyalty, predisposition to purchase and attractiveness of the competition

Psychographic – lifestyle, personality, risk aversion, interests and opinions

Sociodemographic – level of education, family situation, wealth and geographical

Technical – type of customer (private, business, company, etc.), title, name and telephone number

Geodemographic – details about place of residence in terms of economics, population, wealth, average number and ages of children, family structures, social and occupational level, etc.

I’m beginning to wonder if these posts might be more like a drive-by shooting than an informative series.


Can you describe what happened?


I’m not really sure… it started off with something about univariate analysis, but then the regression started and then the Type I and Type II errors. The last thing I remember was something about uniformly minimum-variance unbiased estimator… after that it’s all a blur… oi, what’s this leaking out my ear?

Leave a Reply

%d bloggers like this: