Archive

Posts Tagged ‘DAX’

Use PowerPivot DAX to automatically report on the last month that has data

July 19th, 2010 Kasper de Jonge No comments

A lot of times you just to show values of rows from the current month compared to other specific time periods, like previous month, ytd, previous YTD, etc. The problem is, how do you determine the current month ? This could be the actual current month (like PerformancePoint 2010 does) but in the most cases this is not sufficient because your last month you have data might be one or more months ago because of your ETL.

In this blog post we create a measure that we can use to filter the data using a filter or slicer so the data in the report will show the the last month where have data.

I start by creating a calculated column in my date table that checks if the month is the current month and returns a Y or N.

To determine if the month is the last month we have data i used the following DAX statement:

=IF(MONTH(LASTDATE(ALL(FactInventory[DateKey]))) = DimDate[MonthNumber]
     && year(LASTDATE(ALL(FactInventory[DateKey])))  = DimDate[CalendarYear],"Y","N")

The key here is how we determine the last month to have data: LASTDATE(ALL(FactInventory[DateKey]).

Lastdate will check the last date in the column, but because the date table and the fact table have a relationship DAX would automatically use the this to return only rows from the fact table in the current date context. This way we never can check all the fact table rows. We want to overrule the filter context by using ALL() this will give us all the rows from the fact table. LASTDATE(ALL(FactInventory[DateKey]) will give us the one date that is the last date we have a fact for.

Now we can check if the dates from our date table are in the same month and year as the last value of the fact table and return a “Y” when they are and a “N” when they aren’t this will give us the possibility to select all the values from the last month as we can see in this report:

Now we are able to create the following report:

The great thing about is that when because we use a calculated function this is only calculated at creation or during data refresh. This means it is fast at execution and it means that when new data arrives from a new month the measure automatically determines the new CurrentMonth.

Categories: PowerPivot Tags: ,

How to count the average number of facts per day in PowerPivot DAX

July 16th, 2010 Kasper de Jonge No comments

A pretty simple question in PowerPivot would be: give me the average number of facts for each day data

Sounds easy right? All we have to do is determine the number of facts for each day that has data and do an average of that. The problem is .. how do you do that ?

So we have two tables, FactInventory and DimDate. We need to get the number of rows the fact table for each day we have data. I start by creating a calculated column in the date table to determine the number of rows per day.

=COUNTROWS(RELATEDTABLE(FactInventory))

So this gives us the number of facts for each day. We now can do an average on this number using the average in the  PowerPivot Field list:

Sounds easy.. The trick here is that we use the calculated column to determine the number of facts for each day and use an average on that.

Categories: PowerPivot Tags: ,

Using the excel function SEARCH in PowerPivot DAX

July 7th, 2010 Kasper de Jonge 1 comment

Last week I got a question about he search function in PowerPivot:

I have used the text function SEARCH() in my sample. While doing this I noticed that wildcards seems to be allowed. For example if I do something like this: SEARCH(“google.*/search”,[Referer],1), then google.de/search or google.com.au/search are found. The problem now is that I haved searche the documentation of the DAX functions (TechNet, …) for getting sure that wildcards are allowed. Unfortunately I did not find clear official statements about this. Do you perhaps know if wildcards are really supported or perhaps know some links where I can get more info.

Of course i started searching the internet for a DAX help file at for example the PowerPivot wiki. I unfortunately couldn’t find the answers asked on my questions page. But since most PowerPivot functions are not only similar to the Excel functions but in most cases is the code from these functions copied and pasted into PowerPivot. So we now can broaden our search to include excel sites as well (this won’t count for all the functions).

Here i found the following about wildcards and the SEARCH function:

Wildcards are generic characters used in the following ways:

? (question mark) is used as a placeholder for a single character. For example, to search for “bl?ck” would find “black” and “block.”

* (asterisk) serves as a placeholder for a series of characters within a word. A search for “*body” would return “anybody,” “somebody,” and “nobody.”

To test this i used the search function in my contoso dataset in a calculated column:

=IFERROR(SEARCH(“Product03?1*”,DimProduct[StyleName]),-1)

This function returns a 1 (first position) if the stylename contains Product03 + any character +1 + any other char

if nothing is found it will give an error, we don’t want that, so the catch this using the iferror and return a -1 if nothing is found. As you can see in this example:

So when you cannot find the information you need of a DAX function, broaden your search to include excel function as well, the bets are good that a function (especially text functions) is based on excel.

Categories: PowerPivot Tags: ,

Tune your PowerPivot DAX query, don’t use the entire table in a FILTER and replace SUMX if possible

July 6th, 2010 Kasper de Jonge No comments

I was working on a PoC that needed a measure where I had to use some more advanced DAX functions. In this case I had two fact tables and the customer wanted to have the sales of a specific store, but only if they had a booking in the same period.

To cut straight to the DAX, this formula does the trick:

=SUMX(	FILTER(salesperstore,
	COUNTROWS(FILTER(Bookings,
				MONTH(Bookings[FlightDate]) = MONTH(salesperstore[DateKey]) &&
				YEAR(Bookings[FlightDate]) = YEAR(salesperstore[DateKey])))
	> 0)
	, salesperstore[SalesAmount]
	)

This was a very slow statement and ran for 23 seconds, so i decided to try and tune it.

What happens in this statement: do a SUM of salesperstore[SalesAmount] of a filtered Salesperstore table, only the values of Salesperstore table are returned where the number of rows of a filtered table of Bookings on the same month and year of the current salesperstore context are greater than zero. The SUMX will create a copy of the filtered table and loop through it and evaluate every row and only sum the values that are true.

I started by replacing the SUMX with a calculate. The calulcate doesn’t loop through every row but rather filters the dataset. This made a quicker function:

=Calculate(sum(salesperstore[SalesAmount]),
			FILTER(salesperstore,
			COUNTROWS(FILTER(Bookings,
						MONTH(Bookings[FlightDate]) = MONTH(salesperstore[DateKey]) &&
						YEAR(Bookings[FlightDate]) = YEAR(salesperstore[DateKey])))
			> 0)
		)

But this still was very slow, and i knew why, the FILTER functions use the table supplied to the function and loops through the entire table to return the requested result. I complained on twitter on how slow it was when Vidas send me a message that you should avoid using a filter over an entire table. He got a tip from Marius Dumitru from the PowerPivot team how to solve this, check it out here.

The explanation of the tip is that FILTER is very slow when done over a entire table. So this means in this case that it loops through the Bookings table and salesperstore table while creating copies of the entire table in memory while we only need one column. And that is the key to tuning this query. We don’t want to drag with us the entire table to determine the filter, we only want to have the specific column. This can be done by the VALUES function, as BOL defines it:

Returns a one-column table that contains the distinct values from the specified column. In other words, duplicate values are removed and only unique values are returned. This function cannot be used to return values into a cell or column on a worksheet; rather, you use it as an intermediate function, nested in a formula, to get a list of distinct values that can be counted, or used to filter or sum other values.

And that is exactly what we are going to do, we replace the parameter of filter to be a column of a table instead of an entire table:

=Calculate(sum(salesperstore[SalesAmount]),
		FILTER(VALUES(salesperstore[DateKey]),
			COUNTROWS(FILTER(VALUES(Bookings[FlightDate]),
									MONTH(Bookings[FlightDate]) = MONTH(salesperstore[DateKey]) &&
									YEAR(Bookings[FlightDate]) = YEAR(salesperstore[DateKey])))
			> 0)
		)

This brought the execution time of the query with 100% processor usage (on my dual core laptop) from 23 seconds to 2 seconds. So as you can see it is well worth to take a look at your query’s. Most of the time a lot of performance profit can be gained when you use other formula’s, especially when dealing with SUMX and FILTER. I can understand that end users will not so easily be able to tune it like this, so we need to educate and help them with this.

Categories: PowerPivot Tags: ,

DAX vs MDX vs TSQL

Tomislav Piasevoli recently did a presentation with about DAX vs MDX vs TSQL. The presentation is available online and is well worth a look:

Recently I delivered a session at local conference organized by Microsoft Croatia - WinDays¹º. The title of the presentation was: “DAX vs MDX vs T-SQL”.

As its name suggests, the presentation is oriented towards SQL and BI developers (for change) instead of Excel power-users and compares DAX calculations with T-SQL and MDX expressions. The idea behind, besides improving my T-SQL skills and learning DAX, was to build a mental model of what DAX functions and their combinations actually do by finding their best analogies in other languages. Since there are more than a hundred DAX functions, I’ve concentrated primarily on those that use relations. On one hand, they are probably the most difficult group of functions in DAX. On another, they should be the closest to SQL and BI developers’ mindset.

If you’re wondering why T-SQL and MDX, the reason is simple. PowerPivot uses in-memory column-oriented relational database technology and was developed by SSAS team (together with Excel team). “Relational” means we have every reason to compare DAX with T-SQL. And since SSAS team was involved, MDX comes naturally. Hereby I wish to emphasize that the comparison shown in the presentation was based on imaginary or potential analogies in T-SQL and MDX and not the real implementations built into PowerPivot engine. In the same manner, the summary highlights showing strengths and weaknesses of each language is based on personal impressions. Regardless of that, I believe the presentation might be of great help for many.

Check it out at his blog (also other very interesting reading material there)

http://tomislavpiasevoli.spaces.live.com/blog/cns!5BB64CF526505D83!932.entry

Categories: PowerPivot Tags: ,

Getting difference of two dates in seconds with PowerPivot DAX

December 5th, 2009 Kasper de Jonge No comments

I wanted to get some information on a importing process where i load different files into a database. I don’t have access to the database i only get CSV export files from the admins.

I have two date fields where i want to see the difference between them in seconds (should only take seconds). In SQL Server i would have used DateDiff function and be done with it. No such function is available in DAX so we have to be creative ourselves. I loaded the CSV into PowerPivot and decided to add new measurs in PowerPivot window.

The first solution i came up with was to try and subtract one date from the other, this seemed to work. So now i wanted to get only the  seconds of this subtraction:

=SECOND(<datefield1> – <datefield2>)

This was the result i hoped for.

But being a developer I wasn’t satisfied with this result, when the process would run for more than 60 seconds the minute would be ignored, so you could make it into:

=SECOND(<datefield1> – <datefield2>) + (MINUTE(<datecolum1> – <datecolum2>)*60)

But again what happens when the process runs more than 60 minutes.. after a great discussion on twitter with Vidas on how to solve this problem, he came up with (thanks to Marius Dumitru (MSFT))

86400. * ( [DateField1] – [DateField2] )

First number because: (24*60*60 = 86400 seconds per day)

This is a pretty rock solid formula that would give you the difference in seconds between two dates regardless the interval.

One conclusion Vidas and I got to was when a serious challenge arises PowerPivot is not really suited for a fast solution. You would be better off fixing it in your source or be very creative :) like the solution above. Maybe the default Excel user is better suitable for these challenge because of their experience with Excel.Or  maybe PowerPivot needs a different mindset than the defensive programming we developers are used to do, trying to be prepared for all the possible exceptions, and rely on assumptions (like the time difference is always seconds) and be done with it.

Ofcourse in the future we will have new functions, this will make our work easier.

Categories: PowerPivot Tags: ,

PowerPivot DAX, i love it!

November 6th, 2009 Kasper de Jonge No comments

Vidas Matelis has been twittering live from the DAX session at the SQLPass. I just wanted to share these great new function available in DAX:

  • Data Analysis Expressions = DAX
  • DAX lets user do multidimensional analysis without user knowing that this is multidimensional analysis
  • sample DAX: =[Qty]*[Price] – syntax just like Excel.
  • DAX is not replacement for MDX
  • DAX provides functions that implement relations database concepts: Filter tables, aggregates, follow relationships
  • one of the many sample: =SUMX(RELATEDTABLE([Sales], Sales[Amount]) DAX
  • dax has functions to assist with dynamic aggregations of measures: dynamic sample: =VALUES(Time[year]) & “.” & VALUES(Product[ProductID])
  • DAX sample: =IF(VALUES(Time[Year])=2008,”Baseline”,”normal”)
  • More than 80 Excel functions in DAX
  • CTP3 has new FORMAT function to allow to convert any number to string
  • DAX sample: [salesAmt]/[SalesAmt](All(Product)) – use measure as function.
  • DAX = Sales[SalesAmt]/CALCULATE(Sales[SalesAmt], ALL(Product)) more complex syntax
  • CTP3 DAX has 35 Time Intelligence Functions – require date column in the data. Column type Date
  • DAX this version does not have custom time periods and weeks. Works with Yr, Qtr,Mth,Day
  • DAX: FirstDate, LastDate, FirstNonBlank, StartOfMonth, StartOfQtr, EndOfYear
  • DAX: DateAdd, DatesBetween, DatesInperiod – 26 function that return a table of dates
  • DAX: PreviousDay, Nextmonth, DatesMTD, DatesYTD, TotalMTD, OpeningBalanceQuarter, ClosingBalanceYear, etc <-more funct
  • Year over year growth: =Sales[SalesAmtt]-Sales[SalesAmth)(DateAdd(Time[Date],-1,Year].All(Time))
  • AllTime – need to add now in CTP3, not in RTM(will be under cover). So that first year has data.
  • QTD Sales =TotalQTD(Sales[SalesAmt],Time[Date], All(Time))
  • DAX yearAgo:=Sales[SalesAmt)(ParallelPeriod(Time[Date],-12, Month).All(Time))
  • autocomplete in CTP3 add single quotes, but they are optional.

Wow i can’t wait to start playing with these new  functions ! Thanks Vitas for sharing the info !

I also really really hope these time functions will be available in SSAS in the future, they won’t be availble in R2 Donald Farmer told me.

update: What will happen to the time functions when using a SSAS source? Is a Date dimension ok ? I hope so !

Categories: Gemini, PowerPivot Tags: , ,