Tuesday, September 21, 2010

Predictive Analytics in the Cloud

By Irit Eizips

The ability to do predictive analytics and data mining without being a statistician is here! Predixion Insight is a very cool public and private (on-premise) cloud service that is accessed via Predixion’s client software, which is installed into Excel (as an add-in) and lets you build sophisticated models without really needing to know all the technical stuff that would typically go into performing predictive analytics. Predixion provides powerful collaboration and reporting features, and best of all they are natively integrated to Microsoft’s new in memory BI (IMBI) tool, PowerPivot so you can perform predictive analytics on up to 100 million rows all in the client, then write results back to PowerPivot for slicing and dicing, then publishing via SharePoint. 

The service is over the cloud, so all you really need is the Predixion Insight for Excel client and an internet connection. If your company uses SAS or SPSS, Predixion Insight will soon allow you to leverage that investment by running those models in the cloud, sharing them, reporting on them, then sharing the results with anyone via SharePoint. Predictive Analytics for the masses is finally here!

Here’s a link to a bunch of video tutorials on their website which are easy to follow – and a free 7 day trial version you could download and play with.



Here are some interesting articles about their tool:





http://powerpivotpro.com/ (See interview with JamieMac of Predixion)


Monday, September 20, 2010

Getting Started with SQL Server Data Mining for Retail/Finance

By Rick Durham

There are two informative, technical webcasts found at this link that cover how SSAS Data Mining is used to solve general retail/marketing problems…

  • “Overview of SQL Server Data Mining”
  • “Applying SQL Server 2005 Data Mining to Enterprise Business Problem”

Here is the URL: http://www.sqlserverdatamining.com/ssdm/Home/Webcasts/tabid/62/Default.aspx

Types of finance/marketing problems that data mining can be used for include:

  • Database marketing applications include offer response, up-sell, cross-sell, and attrition models
  • Financial risk management models attempt to predict monetary events such as credit default, loan prepayment, and insurance claims
  • Fraud detection methods attempt to detect or impede illegal activity involving financial transactions
  • Process monitoring applications detect deviations from the norm in manufacturing, financial, and security processes
  • Pattern detection models are used in applications that range from handwriting analysis to medical diagnostics
  • (please note this is by no means an exhaustive list)

Another good site in terms of general tutorial is this following:


Hope this helps get you started!

Thursday, September 9, 2010

Standardize Your MDX Parameter Queries

By Dan Meyers

One thing that I have found useful when writing a lot of Reporting Services reports that have parameters and use MDX is to create some standard calculated members in my cube for the Label and Value properties of the parameters. Just like any other calculated measures that get built into the cube script, you get the advantage of reusing them instead of doing them over and over again in the WITH clause of your query.  The code below is dynamic and is not specific to a particular dimension or anything so it will work with whatever you put on ROWS in your data set query for your report parameters.

Below is the MDX for the calculated members and a sample query.

Insert this code into your cube script (at the bottom)

CREATE MEMBER CURRENTCUBE.[Measures].[ParameterValue] AS

CREATE MEMBER CURRENTCUBE.[Measures].[ParameterCaption] AS 
    String(Axis(1).Item(0).Item(0).Dimension.CurrentMember.Level.Ordinal * 1 , ' ' ) + Axis(1).Item(0).Item(0).Dimension.CurrentMember.Member_Caption,

Sample Query
      {[Measures].[ParameterCaption], [Measures].[ParameterValue]} ON 0,
      {[Date].[Calendar].ALLMEMBERS} ON 1
      [Adventure Works]



Thursday, September 2, 2010

Interpreting Causation in Time Series Forecast models

By Rick Durham

· Building data mining models is one thing
· Determining the primary factors that contribute to the final predicted data point is quite another

One of the most useful algorithms in the SSAS suite of data mining tools is the Time Series algorithm. Its forecast method is really a combination of two other algorithms (ARIMA, ARTxp) and it takes the historic data values in a series to make future predictions regarding that series. It does this by assigning weights to past data values to make future predictions. One of the most important features of this method is its ability to allow the data from other series in the model to be incorporated in the final prediction values.

For example, in the model below, we can see a direct historic relationship between the price of gas and the price of oil. When the time series algorithm was used to build the model it looked at all of the past data points for both gas and oil to make the final prediction for the price of gas.

We can infer from this model, based on historic data points, that the price of oil affects the price of gas (no big surprise here). Please note the oil price value is normalized against Feb 2008 oil price in %.


But what affected oil prices? What caused the huge spike in oil prices July 2008 in our time series forecast? My experience has been that when we create these type models we are always going to be ask what factors caused or drove certain data points in the series to extreme values.

In this case, we might casually answer “ It was a lack of supply in oil with high demand” but this is not true as the following set of historic data charts prove. The chart below shows that in July 2008 production of oil was at an all time high as suppliers were willing increase production when the price point was high. Again no surprise, it’s simple Econ 101.


We would think that demand for oil during this period would also be high driving up the price point -but that is not the case. In fact, demand for oil during this period was very low as this historic data chart reveals. It started dropping in 2007 and hit a low in the summer of 2008 when the price of oil and gas were both at all time highs.


So how can we explain what was driving the price of oil up and thus the price of gas in July 2008? It turns out that one of the primary factors driving up the price of oil was the value of the dollars value against other currencies. In effect, because the value of the dollar was low oil suppliers wanted more dollars for the same units of oil thus driving up the price.


Many types of data that are typically used in time series analysis (think commodities, stock prices, long term weather forecasts…) are driven by complex factors that are often changing and may be non-stationary. The factors that drive a forecast today are not the same as what might be driving it tomorrow. This is what makes understanding what series need to be included in the mining models and what indirect factors drive them tricky.

If we take our oil example, many factors have driven the price around historically. These include: supply, demand, war, strikes, geopolitical tension (or lack of), weather… In some cases, the input factors are so complex and varied that the only way to predict future values is to use the historic time values as there is no way to determine all of the factors that move the data. This is certainly true of the stock market as well.


In this blog, I have taken a simple example of time series analysis based on historic oil and gas prices to show how once we have developed our mining model we can potentially dive deeper to understand what factors are driving the forecasts, and ultimately provide better insight to the business. This is the essence of what BI should be about and is often overlooked by technicians who are overly occupied in the complexity of the tools they are using rather than how to use the results they generate to impact the organization.