Help, I received a pivoted data file that I want to combine with my data in PowerPivot

Often when working in Excel you get all kinds of flattened files that come from all kinds of different source and you want to use as them as a source for your analysis. Today we are looking at how DataExplorer can help you with this. Click here to download DataExplorer, for more details on Data Explorer check out my previous blogs.

Image we have the following Products with revenue for the following fiscal year and month. This data comes from your nicely structured datawarehouse:

image

Your managers, who asked for this report, wants to see these values combined with the target values. You get these values delivered in the following pivoted text file:

image

The problem with a format like this is that it is not easy to pivot on. In order for us to to do analytics on top of this we need to reshape the data into a shape where the months and years become a value, not a header. This is called Unpivotting.

So how do I do this ? And how do you get these numbers into one combined  Pivottable ? The answer here is Data Explorer. I select import from CSV in DataExplorer and point to the CSV file.

First I double click on Query and rename the query to revenueTarget to give the query a proper name, next I select that I want the First row as headers:

image

select all the columns that contain values and right mouse click, select Unpivot columns:

image

This will give us the result we want where year and month have now become values instead of column headers:

image

Now I rename columns to something sensible. In my PowerPivot model I need to combine this data with a date column so I would like to add that as well. I right mouse click on the date column, press Insert Column and select Custom

Here I can enter a value that will create a new date field:

=Text.End([YearMonth],2) & “/1/” &Text.Start([YearMonth],4)

And I Change the type to Date:

image

Ok the transformations are completed. Press Done and Add the results to the data model:

image

Now to add this to the data model I need d to create relationships, I do this by creating relationships to the DateTable and Product. I am essentially treating this new table a secondary fact table:

image

Now to combine values from the Invoice table and RevenueTarget into one KPI I go back to the datagrid view, select the RevenueTarget column and click autosum on the ribbon to create a measure for Sum of RevenueTarget. I also create a measure for that will serve as the base measure for our KPI. I add the following measure:

KPI:=[Sum of RevenueAmount]

Now I right mouse click and select KPI, I select the Sum of RevenueTarget as target value:

image

Now adding this KPI to the Pivottable we had in the beginning and voila:

image

0  

When Importing from Tabular Model use DAX

I recently was building a model in Excel that was using data from a Tabular Model. I started going down the import path by selecting import from Analysis Services in the PowerPivot window and use the MDX query designer to select the data that I want to import into PowerPivot / Excel Model.

I imported the data but it took over one hour to import the data. I was very unsatisfied with the time it took me to refresh the data. Then I got thinking, MDX is not really designed for these tabular result sets, DAX is. As this is a tabular model I can use DAX to get the results. Unfortunately I don’t have a automated DAX query editor but writing the DAX equivalent of this particular MDX query is quite simple. After rewriting the query to DAX is brought back the import time to …. 3 minutes. From one hour on the same dataset.

These are the steps needed to import the data, to start you get this empty window that asks you to specify the MDX query:

image

Now instead of writing the MDX you can click on Design to have the MDX query designer help you build it:

image

Now press OK or click on the designer Icon in the top right to get to see the MDX. The query builder will generate the following MDX:

SELECT NON EMPTY { [Measures].[Sum of SalesAmount] } ON COLUMNS,

NON EMPTY { ([DateTable].[FullDateAlternateKey].[FullDateAlternateKey].ALLMEMBERS *

[Product].[Color].[Color].ALLMEMBERS *

[Product].[ModelName].[ModelName].ALLMEMBERS ) }

DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS

FROM ( SELECT ( { [DateTable].[CalendarYear].&[2004] } )

ON COLUMNS FROM [Model])

WHERE ( [DateTable].[CalendarYear].&[2004] )

CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

Now rewriting this to DAX, interesting to note is that DAX works inside out, start by reading the comments/functions from the most inner function and work your way to the outside to see what the query returns:

EVALUATE
//Calculate the table to filter the results by the CalendarYear 2004
CALCULATETABLE(
//Add columns to table containing the measures
ADDCOLUMNS(
//Filter the results where we actualy have a measure
FILTER(
//Crossjoin all the columns required in the resultset
CROSSJOIN(
values(DateTable[FullDateAlternateKey]),
//when multiple columns from same table add them together with
//Summarize for performance
Summarize(values(Product),
Product[color],
Product[ModelName])
)
, Not IsBlank(Sales[Sum of SalesAmount])
)
, “[Total]“, Sales[Sum of SalesAmount]
)
, DateTable[calendaryear] = 2004
)

The above query is also a general approach on how you should write a query to retrieve a tabular result set using DAX to get the fastest result. It should work in most circumstances.

0  

Analyze a Twitter feed with Excel 2013, DataExplorer and GeoFlow

I recently got access to a csv file that contains twitter feeds filtered by Azure. I want to do some simple analytics on where the twitter users live in a visual way.

I am going to load the csv file into Excel using DataExplorer (download add-in for Excel 2012 here).

image

And I select Import from Csv, and select the file that I received. That opens the DataExplorer query editor and in this case I want to import the csv file completely without any changes. I do want to Use the first Row as Headers :

image

Now I import this data into Excel,

image

after importing the data into Excel I want to rename my query by double clicking Query1 and rename it to “Twitter Data” and press Load to data model to load it into the AS engine.

Now I want to plot the tweets on a map. I am going to use GeoFlow for that, as this is data that I got from the Internet there are some data quality issues. The current version of GeoFlow doesn’t  allow us to filter data. So I am going to use a Excel trick to create a filtered down table by using a Query Table where I filter out all tweets where no region is defined.

So we go back to Excel, select an empty sheet and click on Data, Existing Connections, Tables and double click the Twitter data table:

image

And press OK in the next dialog to add a table. This adds a table to the Excel sheet:

image

Now right mouse click on the table and select Table, Edit DAX:

image

Now I change what is shown on the sheet by editing the DAX expression for this table:

Evaluate
calculatetable(
Summarize(
‘Twitter data’,
‘Twitter data’[ID]
,’Twitter data’[Location]
)
, ISBLANK(‘Twitter data’[Location]) = false)

image

Press OK and this will only give me that ID’s and Locations where the Location is not blank:

image

Comment: the results of the query will have the limitation of 1,000,000 rows in Excel but you could always aggregate the results by location using DAX. I don’t think many people want to plot more than a 1,000,000 locations. 

Now go to Insert GeoFlow (Install add-in here):

In GeoFlow you now can pick columns from the filtered table:

image

Now select location as other and Map it:

image

Now select ID as height:

image

This plots all the tweets on Azure for the last week on the map, you might notice the Netherlands and Belgium are quite some hotspots on Azure Smile (no I did not rig the results!)

 

image

 

Hope that you some idea’s on how to use Excel, DataExplorer and GeoFlow.

0  

Updated theme

I updated the theme of my blog after 4 years. I like the clean looks of the style formerly known as metro. Let me know what you think or have any comments!

Kasper

3  

NBA team spending and their results with Excel, PowerPivot and Data Explorer

Ok truth be told I am a big time Lakers fan and this season is a big disappointment for me, one of the things that irks me is that the Lakers are being paid the most of all the teams in NBA but lack the results. In this blog post I am going to compare the winning pct per team per dollar paid so we can see the results.

First I want to import the data that I need. I am going to use Data Explorer to import the data into Excel and then I am going to use PowerPivot to analyze the results.

First data source that I am going to use is http://espn.go.com/nba/salaries/_/page/1

image

this I want to combine with http://espn.go.com/nba/standings/_/group/1

image

I start by importing the standings, go to Excel and the Data Explorer tab and select “from web”

image

Paste in the URL and select Table 0

image

Now I want to remove the first row so I can select Remove Top Rows and select the first row and I also select Use first Row as Headers:

image

Now I am only interested in the Team name and win – loss pct so I am removing all other rows by selecting Hide Columns:”

image

Now I want to clean up the Team name to remove unwanted characters:

image

I select Replace Values and remove y – and x- with an empty value.

Ok I am happy with the result and I press Done to import the data into Excel:

image

I want to rename the Query to NBA Standings and Load it to the Data model.

Next up is import a list of all the players and their Salaries. Again the same thing Import from web, this time as is:

image

I press OK to import and rename the Table:

image

Now we only see 40 players here as this webpage we import from actually has multiple pages. The cool thing about Data Explorer is that under the covers its is a special language called M. What we are going to do now is make a function out of the import function that we can call iteratively for each page.

First I need to make sure I can actually edit queries, I go to the Data Explorer tab and select “Enable Advanced Query Editing”:

image

Now I go back to the table I just imported and click on “Filter & Shape” and click on this little “script” icon that now appears:”

image

This pops open the script dialog:

image,

I change the script to the following:

(page) => let
Source = Web.Page(Web.Contents(“http://espn.go.com/nba/salaries/_/page/” & Number.ToText(page))),
Data0 = Source{0}[Data]
in
Data0

This will add a empty sheet to excel with just this function, nothing can be shown on the table but it needs to be there, I also renamed the table to Query.

image

Ok now I want to use that function for all 14 pages. I go to Data Explorer and select import from other sources, “Write Query”.

This opens a blank query window:

image

Here I paste in the following:

let
Source = Table.Combine(List.Transform({1..14}, Query))
in
Source

this will combine the results of a the query function executed from 1 to 14

image

Press OK and the all data (527 rows instead of 40) will be added to the sheet:

image

Pretty awesome right ? Now lets add this to the model as well by clicking load to data model:

image

Quick tip, don’t rename the tables. This doesn’t work in the current build of Data Explore and will screw up the connection to the model.

Now I want to create a relationship but we have problem, how do I relate:

image

with

image

We need to make the values in the column the same. I will plan to extract the city from team in dataset 2 to combine it with dataset 1 and remove LA from dataset1.

So I click on filter and reshape again at the salaries. I select the column and select by delimiter, select space and at the right-most delimiter.

That will give me:

image

Now I rename the Team.1 to City and Team.2 to Team. The last problem is that LA has two teams. You can see that in the first dataset they solved that by deviating from the norm by adding LA Lakers and LA Clippers. I need to do the same here.

First I rename Los Angeles to LA. Now I add a new column that based on my expression add the team name if the city is Lakers. I do right mouse click on the Salary column and select Insert column, Custom… This opens a new window where I can type a expression. What I added was:

if [City]=”LA” then [City] & ” ” & [TEAM] else [City]

This will give me the result that I want (I also renamed the columns):

image

Except that the steps did not anticipate a NBA team with two words “Trail Blazers” . So I added a special step for that to the formula”:

if [City]=”LA” then [City] & ” ” & [TEAM] else if [City]=”Portland Trail” then “Portland” else [City]

I press OK and reimport the data.

 

Now I am able to create a relationship in PowerPivot:

image

and start my Analysis by creating a Pivottable. I also formatted PCT and Salary in PowerPivot to get a better view.

image

There is a large amount of players in there that are not assigned to any team. I want to filter those out of the result set. So I go back to Data Explorer and deselect players with an empty city:

image

This gets rid of 100+ players that I don’t care about.

Now lets look at the data and add a measure that determine the PCT of Salary compared to all other teams.

I add the following measure to PowerPivot:

Salary all teams:=CALCULATE(sum(Query1[SALARY]),ALL(Query))

The first shocking result was when this measure returned the value of 1.8 billion ..:

image

Now I add a percentage of salary for each team against all salaries.

pct Salary:=sum(Query1[SALARY])/[Salary all teams]

image

To make it a little bit more clear instead of PCT I also create a rank:

Rank by PCT of winnings:=RANKX(all(Query[NBA]),[Sum of PCT])

this really gives you an idea that money doesn’t buy you everything:

image

The most interesting one is Indiana all the way down with almost half the salary costs as the Lakers at 7th best team in the NBA:

image

and the teams above and below the Lakers earn less together  then the Lakers themselves..

image

Next up I would like to add a measure that will calculate the salary per player. So add the following two measures:

nr of players:=COUNTROWS(Query1)

Salary per player:=sum(Query1[SALARY]) / [nr of players]

 

Unfortunately that didn’t give us much interesting information, besides what we already know. It is interesting to see that a few teams have almost 20 players on their team:

image

While Chicago and the Lakers are not even capable of doing 5 on 5 in training ..

image

Hope that gives you some idea of the power of Data Explorer and PowerPivot together.

4  

What is eating up my memory the PowerPivot / Excel edition

A common question asked is what columns in my model are using the most memory. Earlier I created a workbook that was able to get the memory for a Tabular server which you can download here but that didn’t work with an embedded Excel Data model or PowerPivot workbook.

But I recently found a new way that would allow me to get this information in PowerPivot / Excel 2013 using a macro. Many thanks to Ken Puls  who helped me clean up the macro and give me some great Excel tips. It always great to see SQL and Excel folks learn things from each other :) . Thanks to his help this macro, once created, will be accessible for all workbooks without having to change the individual workbooks.

This macro uses the Excel connection retrieved from the Excel OM and sends a SSAS DMV query to the embedded model to retrieve memory information. Using the connection to send custom query is not a scenario that is supported, meaning custom queries send through the connection will not be guaranteed to work in any future updates of the product.

When you run this macro in any workbook that contains a Excel Data Model will give you a new worksheet with a table containing all tables, columns, datatype and MemorySize (KB). It also has a Pivottable that aggregates the results and is sorted by size:

sheet

The SSAS DMV used to get this information does only return the segment size of a column, this is not all the memory that is used per column but will give enough information to get an good idea of the actual usage. For more details on the internals of the Tabular engine please watch this teched presentation

To get this Macro available for all you workbooks in Excel the following steps are needed.

  • If you haven’t already exposed the Developer tab in Excel, do that first. (right mouse click on the ribbon)
  • On the developer tab, record a new macro, and choose to store it in your “Personal Workbook”
  • Stop recording
  • Press Alt+F11 to get into the Visual basic editor
  • Find Module1 in the personal macro workbook
  • Replace ALL the code for the code you just recorded in there with the code in the attached file
  • Go back to Excel
  • Press Alt + F8 and run the Macro

The entire Macro is as following:

Option Explicit

Sub GetMemoryUsage()
    Dim wbTarget As Workbook
    Dim ws As Worksheet
    Dim rs As Object
    Dim lRows As Long
    Dim lRow As Long
    Dim sReportName As String
    Dim sQuery As String
    sReportName = "Memory_Usage"

    'Suppress alerts and screen updates
    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    'Bind to active workbook
    Set wbTarget = ActiveWorkbook

    'Check if a worksheet already exists
    Err.Clear
    On Error Resume Next
    Set ws = wbTarget.Worksheets(sReportName)
    If Err.Number = 0 Then
        'Worksheet found
        If MsgBox("A memory usage sheet workbook is already detected, " & _
            "do you want to remove the existing one and continue?", vbYesNo) = vbYes Then
                ws.Delete
        Else
            GoTo ExitPoint
        End If
    End If

    On Error GoTo ErrHandler

    'Make sure the model is loaded
    wbTarget.Model.Initialize

    'Send query to the model
    sQuery = "SELECT dimension_name, attribute_name, DataType,(dictionary_size/1024) AS dictionary_size " & _
        "FROM $system.DISCOVER_STORAGE_TABLE_COLUMNS " & _
        "WHERE dictionary_size > 0"
    Set rs = CreateObject("ADODB.Recordset")
    rs.Open sQuery, wbTarget.Model.DataModelConnection.ModelConnection.ADOConnection
    lRow = rs.RecordCount

    If lRow > 0 Then
        'Add report worksheet
        Set ws = wbTarget.Worksheets.Add
        With ws
            .Name = sReportName
            .Range("A1").FormulaR1C1 = "Table"
            .Range("B1").FormulaR1C1 = "Column"
            .Range("C1").FormulaR1C1 = "DataType"
            .Range("D1").FormulaR1C1 = "MemorySize (KB)"

            lRows = 2
            rs.MoveFirst

            Do While Not rs.EOF
                'Add the data to the rows
                .Range("A" & lRows).FormulaR1C1 = rs("dimension_name")
                .Range("B" & lRows).FormulaR1C1 = rs("attribute_name")
                .Range("C" & lRows).FormulaR1C1 = rs("DataType")
                .Range("D" & lRows).FormulaR1C1 = rs("dictionary_size")
                lRows = lRows + 1
                rs.movenext
            Loop

            'Format the Memory Size field
            .Columns("D:D").NumberFormat = "#,##0.00"

            'Create table
            .ListObjects.Add(xlSrcRange, .Range("$A$1:$D$" & lRow + 1), , xlYes).Name = "MemorySizeTable"
        End With

        'Create PivotTable
        wbTarget.PivotCaches.Create(SourceType:=xlDatabase, _
            SourceData:="MemorySizeTable", _
            Version:=xlPivotTableVersion15).CreatePivotTable _
            TableDestination:="Memory_Usage!R2C7", _
            TableName:="MemoryTable", _
            DefaultVersion:=xlPivotTableVersion15

        'Modify the PivotTable
        With ws
            With .PivotTables("MemoryTable")
                With .PivotFields("Table")
                    .Orientation = xlRowField
                    .Position = 1
                    .AutoSort xlDescending, "Sum of MemorySize (KB)"
                End With
                With .PivotFields("Column")
                    .Orientation = xlRowField
                    .Position = 2
                    .AutoSort xlDescending, "Sum of MemorySize (KB)"
                End With
                .AddDataField .PivotFields("MemorySize (KB)"), "Sum of MemorySize (KB)", xlSum
                .PivotFields("Table").AutoSort xlDescending, "Sum of MemorySize (KB)"
                .PivotFields("Column").AutoSort xlDescending, "Sum of MemorySize (KB)"
             End With

            'Format the Memory Size field in the PivotTable
            .Columns("H:H").NumberFormat = "#,##0.00"

            'Add conditional formatting
            With .Range("H3")
                .FormatConditions.AddDatabar
                .FormatConditions(.FormatConditions.Count).ShowValue = True
                .FormatConditions(.FormatConditions.Count).SetFirstPriority
                With .FormatConditions(1)
                    .MinPoint.Modify newtype:=xlConditionValueAutomaticMin
                    .MaxPoint.Modify newtype:=xlConditionValueAutomaticMax
                    With .BarColor
                        .Color = 13012579
                        .TintAndShade = 0
                    End With
                    .BarFillType = xlDataBarFillGradient
                    .Direction = xlContext
                    .NegativeBarFormat.ColorType = xlDataBarColor
                    .BarBorder.Type = xlDataBarBorderSolid
                    .NegativeBarFormat.BorderColorType = xlDataBarColor
                    With .BarBorder.Color
                        .Color = 13012579
                        .TintAndShade = 0
                    End With
                    .AxisPosition = xlDataBarAxisAutomatic
                    With .AxisColor
                        .Color = 0
                        .TintAndShade = 0
                    End With
                    With .NegativeBarFormat.Color
                        .Color = 255
                        .TintAndShade = 0
                    End With
                    With .NegativeBarFormat.BorderColor
                        .Color = 255
                        .TintAndShade = 0
                    End With
                    .ScopeType = xlSelectionScope
                    .ScopeType = xlFieldsScope
                End With
            End With

            With .Range("H4")
                .FormatConditions.AddDatabar
                .FormatConditions(.FormatConditions.Count).ShowValue = True
                .FormatConditions(.FormatConditions.Count).SetFirstPriority
                With .FormatConditions(1)
                    .MinPoint.Modify newtype:=xlConditionValueAutomaticMin
                    .MaxPoint.Modify newtype:=xlConditionValueAutomaticMax
                    With .BarColor
                        .Color = 15698432
                        .TintAndShade = 0
                    End With
                    .BarFillType = xlDataBarFillGradient
                    .Direction = xlContext
                    .NegativeBarFormat.ColorType = xlDataBarColor
                    .BarBorder.Type = xlDataBarBorderSolid
                    .NegativeBarFormat.BorderColorType = _
                        xlDataBarColor
                    With .BarBorder.Color
                        .Color = 15698432
                        .TintAndShade = 0
                    End With
                    .AxisPosition = xlDataBarAxisAutomatic
                    With .AxisColor
                        .Color = 0
                        .TintAndShade = 0
                    End With
                    With .NegativeBarFormat.Color
                        .Color = 255
                        .TintAndShade = 0
                    End With
                    With .NegativeBarFormat.BorderColor
                        .Color = 255
                        .TintAndShade = 0
                    End With
                    .ScopeType = xlSelectionScope
                    .ScopeType = xlFieldsScope
                End With
            End With

            'Collapse the PivotTable
            .PivotTables("MemoryTable").PivotFields("Table").ShowDetail = False

            'Set selection to top
            .Range("MemorySizeTable[[#Headers],[Table]]").Select
        End With
    Else
        MsgBox "No model available", vbOKOnly
    End If
    rs.Close

ExitPoint:
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
    Set rs = Nothing
    Exit Sub

ErrHandler:
    MsgBox "An error occured - " & Err.Description, vbOKOnly
    Resume ExitPoint
End Sub

Download a sample workbook here. The Macro is applied to this workbook only for you to be able to download it, to get this available for all workbooks go through the steps above.

7  

Using Traditional SSRS parameterized reports against a Analysis Services Tabular model using DAX

I get this question every now and then: Is it possible to create a traditional SSRS report with parameters against a tabular model using DAX. Marco already created a great blog post that sums it all up here, but I decided to create a step by step example.

Read more »

6  

Filtering results to only show top x results in Power View chart

 

A question came up recently on how can I limit the number of elements on a chart in Power View by a custom calculation. Imagine you have a chart in Power View with too many data points to make anything out and you want to filter out the “noise”. Just give me the most profitable datapoints.

I created this chart with just a few datapoints. I love how Power View and Bing maps allow me to put any location in and it just knows where to plot it !.

image

Then I create a chart:

image

Now I create a DAX measure that determines the Rank of each postalcode by sum of data:

=RANKX(ALL(CityData[PostalCode]),[Sum of Data])

This measure will compare the result of [Sum of Data] for the current data point against the result of [Sum of Data] for all rows in ALL(CityData[PostalCode]) and return the relative rank of those two numbers.

In a table this shows us as following:

image

Now I put that measure in the map filter area and start filtering on it. This gives us only the top 2 ranked cities:

image

The rank is automatically measure calculated for each data point in the chart (like what you see in the table in the previous image).

Download the sample workbook here.

 

5