Combine two files (or fact tables) in PowerPivot using relationships, relationships explained

I had a conversation last week with a Excel power user who wanted to combine data he got delivered from his sales department in a csv file and data in one xlsx file with the target of the sales.

He had difficulty understanding how he can combine these two in PowerPivot. Although he had seen some demo’s and read some blog posts about fact tables, dimension tables and relationships and he still was unable to get these 2 sources combined.  In this blog post we are going to combine two fact table by creating a relationship and take a look at how relationships work.

The purpose is to create a report where we can see the number of sales by month by country next to the target values.

We start by importing the files into PowerPivot, first we insert the txt file:

Then the target values by year, month and country from the xlsx file by using create linked table

This results into two tables inside PowerPivot

So we now are able to create a report using our sales we imported.

But to get values of the two different tables in one pivottable we have to relate the two to each other, PowerPivot warns us that we should create a relationship, but is unable to detect one for us.

We have to create one ourselves, this is where we have to stop thinking like a BI pro, when I started to think about a solution for this I immediately started to think about creating new dimensions, importing new tables to use as intermediate table between the two fact tables, like a kimbal star diagram:

We have to remember we are building an ad hoc report with data that doesn’t come from an ETL, we have gathered some data somewhere and all we need is a quick answer, not to build a Bi application that we have to use for all kinds of analytics in  the next few year.

We have to take a few steps back and remember what the purpose of this report was, i want to get the sales  to be combined by the target. Combining two tables means creating a relationship. To be able to do this we have to take a look at what relationships between tables are. In powerpivot relationships are called “lookup relationships”:

This is a great name to describe what happens with relationships. Lets say we have one table containing values per country (fact table). The country column from the fact table is filled with countrycode. We want to show the country name instead of the code in our report. To be able to do this we create a secondary table called country. We want PowerPivot to lookup  values from the country table every time we show values from the fact table by countrycode. To do this we create a relationship:

In this case we would select our facttable as “table” in the relationshipscreen and we select the country table to lookup the country value in. This would make it possible to show the countryname instead of the countrycode at our pivottable.

Ok back to our example, we have two tables where we would like to lookup the target value from the “target” table next to each value from the “fact” table. To be able to lookup a value from a table you have to make sure PowerPivot can only find one unique value at the lookup table for each row in the fact table. When we would have two countrynames for FR PowerPivot wouldn’t know what to put in the pivottable as the value belonging to FR. PowerPivot has a check to make sure you do this correctly, you will get a “The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values.” error when you try to create this relationship. We need to find a unique value in the lookup to table to relate to our fact table.

To identify this value, also know as the key, in our example we look at the columns from both tables , we quickly can identify the unique value:

As we can see both tables have calendaryear, Monthnumberofyear and salesterritorycountry. So when we want to lookup the target we can lookup the target belonging to the combination of these 3 values. We cannot make a relationship on the Calendaryear column only since this wouldn’t be unique. We have to create a unique column from the combination of the three values, we can do this by using the & operator, resulting in the following column:

YearMonthCountry=Fact[CalendarYear] & Fact[MonthNumberOfYear] & Fact[SalesTerritoryCountry]

We need to do this at both tables and then we can create a relationship between the two tables:

Now we are able to combine the two values in one report.

Powerpivot will automatically sum all the values from the two tables by looking up all the values in the lookup table belonging to the data from the fact table that is selected in this pivottable.

The last thing to keep in mind is when we would select a productname from our fact table in our pivottable we are unable to lookup a value from the target table since the product isn´t in the “target” table.

  • james

    Hello,

    Could you please tell me know what’s your difference measure calculation?

  • Kasper de Jonge

    Hi @james ,

    The difference measure is just the Fact[Sum of SalesAmount] – Fact[Sum of Target]

  • http://www.pedrocgd.blogspot.com PedroCgd

    HI Jonge,
    You have not one but 2 start-schema, or a galaxy, correct?
    If true, I think that PowerPivot should automatically detect and link these to fact tables that has conformed dimensions in common.
    I only believe in Power Pivot if the source is a DW or OLAP cube.
    very interesting post and blog.
    Regards,
    Pedro

  • Kasper de Jonge

    Hi @PedroCgd , thats right i have 2 fact tables so two star schema’s.

    I agree powerpivot could detect this but i understand why this is pretty hard to do, you have to be really aware what the lowest degree of data is and then what also could be connected to another table. The main problem is that PowerPivot currently doesn’t support multiple keys. When they add this it would be more easy for them to implement the detection you suggest.

    Kasper

  • http://piglings.blogspot.com Vikram Sridharan

    Clear and crisp, wish I had landed up here before figuring out everything on my own :( However just to play devil’s advocate, why go through all the trouble. Wouldn’t it make more sense to prepare a key, do a vlookup and pivot the output. Whats the huge benefit if you need to build the schemas / keys yourself.

    Honestly, am not convinced of letting my users start building stuff on their own. When the level of the data in each on the sources not being at the same level, I can see so much going all wrong.

    Cheers,
    Vikram.

  • http://www.iteris.com.br Joao Tornovsky

    Hello Kasper, great post, thanks!

    At the very end, you’ve mentioned that there would be no target for products, which is absolutely right, but it souldn’t mess up the analysis as it is messing mine. I am having a hard time to figure out a solution for a similar situation. From what I’ve seen, if you nest the product attribute under the country, you would get very undesirable results.

    My expectations, taking from what a reasonably well designed SSAS cube would give us, was that this would lead to something like this:

    – | |
    | |

    Instead, I am getting this:

    – | |
    | |

    The collection would include every existing product, even those which are totaly unrelated to the country (that is, no sales ever in that place).

    Did I manage to make my self clear? Sorry for the lack of resources, I would add a printscreen if I could… :-)

    Anyway, do you know how to overcome this behavior?

    Cheers, Joao

  • http://www.iteris.com.br Joao Tornovsky

    @Joao Tornovsky

    Ops, sorry, I’ve made a bad character choice and my example was ruined by the html editor… This is what I meant.

    “(…) My expectations, taking from what a reasonably well designed SSAS cube would give us, was that this would lead to something like this:

    – (country) | (sales amount for coutry) | (target for country)
    (subset of products) | (sales amount for product in country) | (null target)

    Instead, I am getting this:

    – (country) | (sales amount for coutry) | (target for country)
    (all products) | (sales amount for product in country) | (target for country)

    (…)”

    Cheers, Joao

  • Igor

    Hello Kasper,

    let me example:
    * I have two entities T1, T2
    * I have two different logical one-to-many relationships between entities
    how to manage the situation in PowerPivot relationships?
    Looks like PP has got an assumption that two entities are allowed having zero or maximum one relation between.

    I could find a workaround in creating as many surrogate entities as many relations I need, but I cannot create those surrogate entities in automatic manner based on PP datasource, only manually or with the help of programming.

    Regards,
    – Igor

  • Kasper de Jonge

    Hi @Igor ,

    The only way to do this is manually, this feature is not available in PowerPivot v1.

  • Fred

    We have several Progress Databases running on AIX. The structure of the databases are identical and they represent different divisions of our company. Currently I have an Excel pivot table that points to an Access table that is updated with a union query that combines the data from four databases into one table. Is there a way to create the union query in PowerPivot so I don’t have to use Access? I have seen some examples of union queries for pivot tables and PowerPivot, but they have all been from other Excel files and not from external databases. Thanks for any advice!!!

  • Joanne

    Hello,

    I was wondering if someone can help me understand something with powerpivot. I have two tables that are related with a key field. Table 1 & Table 2.

    Say in Table 1 I have the Key field and fields: Division, Segment, insurance Line, State, # Sold, # not sold.
    Table 2 has Key field and fields: Score, Emod

    I create a powerpivot table
    add Division, Segment, State as filters
    add insurance line as Row label
    add # Sold, not sold, Score, Emod as values sum

    example:

    Divison 1
    Major
    PA
    # Sold # Not Sold Score Emod
    WC 5 2 539.175 570.4179
    Total

  • Joanne

    Hello,

    I was wondering if someone can help me understand something with powerpivot. I have two tables that are related with a key field. Table 1 & Table 2.

    Say in Table 1 I have the Key field and fields: Division, Segment, insurance Line, State, # Sold, # not sold.
    Table 2 has Key field and fields: Score, Emod

    I create a powerpivot table
    add Division, Segment, State as filters
    add insurance line as Row label
    add # Sold, not sold, Score, Emod as values sum

    example:

    Divison 1
    Major
    PA
    # Sold # Not Sold Score Emod
    WC 5 2 539.175 570.4179
    Total

    Sorry I didnt finish.

    Anyway, When I filter on Division, Segment, State all the data changes for the fields that are in Table 1 (# sold, # not Sold) but the fields from Table 2 (Score & EMOD) does not change. I would have thought because they are now related by the key field that when I filter on Division & State it would filter the data in Table 2 even though Table 2 does not have Division & State in the data. Am I thinking abou this wrong?

  • Paolo

    Hi Kasper,

    I have a simlar situation but I cannot apply your solution because the two fact tables are on a many to many relationship.

    Do you know how to sort this problem out?

    Thank you

    Paolo

  • Paolo

    ….also, I have created a cross table but it doesn’t seem to work correctly…

  • Kasper de Jonge

    Hi @Paolo

    Many to Many is a different issue. This is currently not supported directly in PowerPivot. There are many session at conferences you can find online on how to solve this using DAX.

    Kasper

  • http://Website Charles

    Hi Kasper, I have two fact tables (sales and POReceipts) that share all the same dimensions (store, item, supplier, date). In this situation I find that as long as I make my context references to the Dim tables (and not fields in one fact table) there seems to be no reason to aggregate the data into one table. Fields from both tables seem to coexist in the presentation well and calculated fields can accurately pull data from both tables at once. So, is there some benefit in combining this data into one table?

    • Kasper de Jonge

      Hi Charles, combining tables might make your life easier but if you dont have a real need I would say build a star schema.

  • http://Website Wen

    Hi Kasper,
    I have 2 facts table (sales and sales target).
    Under sales, I have 11 columns with field [Category], [Channel],[Region], [Brand],[SubBrand],[Variants],[Packsize],[Year],[Month],[Sales Qty] and[Sales Amt].
    Under Sales target, there are 6 columns with [Region], [Sub-Brand], [Channel],[Year],[Month] and [Sales Target Amt].
    I would like to create the relationship by putting [Sales Amt] and [Sales Target] side by side for the pivot table as well as to calculate the achievement %.
    I have created 2 unique field “=sales[region]&sales[sub-brand]&sales[channel]&sales[year]&sales[month] for sales table whereas “=sales target[region]&sales target[sub-brand]&sales target[channel]&sales target[year]&sales target[month] for target table.
    Then I created the relationship with “table”:sales and ‘related lookup table”:sales target with unique fields as the columns.
    However, it seems like doesn’t work at all.
    Please advice. Thanks.

  • http://Website kfschefer

    I am new to Power Pivot – and was looking for solution to using Edit Table Properties to join to separate linked data sources into 1 pivot table data source(view)

    here is my SQL via the Query Editor

    SELECT [PLATFORM], LEFT([SEVERITY],1) AS SEVERITY_1, [DETECTED ON], [CLOSED ON], [DEFECT STATUS], [PRODUCT RELEASE_LTE] AS PRODUCTNAME, [ASSIGNED TO], [DETECTED BY]
    FROM [LTE$]

    UNION ALL
    SELECT [PLATFORM], RIGHT([SEVERITY],1) AS SEVERITY_1, [DETECTED ON], [CLOSED ON], [DEFECT STATUS], [PRODUCT RELEASE_LTE] AS PRODUCTNAME, [ASSIGNED TO], [DETECTED BY]
    FROM
    ‘C:\Users\Karen\Documents\Ericsson\TPR_DATA.xlsx’.[TPR$]

    I am getting error message:

    “The Sql statement is not valid. There are no columns detected in the statement”

    I have tried using the existing Connection to connect to the LTE data, then using the Query Editor to combine LTE Data to the TPR data. I even created a sample and verified column count and column names match – still no luck.

    I was hoping you could help me get this working or point me in the right direction. any help is greatly appreciated.

    Thanks,

    kfschaefer

    • Kasper de Jonge
      • kfschaefer

        How do I duplicate the first set of requirements (columns hidden, functions etc) and apply the same design to the other data set.

        ie. LTE – has been set and TPR needs to replicate the same as LTE.

        Thanks,

        Karen

      • http://Website kfschaefer

        Thanks,

        One more question regarding Calculated fields within Power Query-

        I need to calculate #of Days between two dates and be able to handle if close data is null.

        Field Names: Detected ON, Closed On.

        IF closed on is null then I need to take the current date (NOw)- Detected ON.

        What is the proper syntax?

        I did try days.duration but unsure on how to handle the null value.

        • Kasper de Jonge

          something like this should work:
          if [Column1.ClosedOn]<> null then [Column1.ClosedOn]-[Column1.DetectedOn] else Date.From(DateTime.LocalNow())-[Column1.DetectedOn])

    • Anthony Newell

      Would that be Karen from Preston Brooke?!

  • Pingback: PowerPivot Workout #5 – Basic Calculation Formulas Using Relationships | Power Pivot Workout()

  • Anthony Newell

    Hi Kasper,
    I’m currently trying to get going rebuilding a retail reporting process for a client that is currently very convoluted and where PowerPivot could rationalise and simplify enormously
    Unfortunately I cannot interface the data systems directly so will have to live with bringing multiple csv data files into my model first using PowerQuery to strip out unnecessary data and then load it to the data model
    This will give me multiple sales tables and multiple costs tables. Multiple being one table for actual sales, one table for forecast sales etc etc. which leads me to my question
    The one common denominator is StoreID which appears in every single data file
    They also use a hierarchy table which consists of unique StoreID and hierarchy info like geographic region, channel etc. this provides a way of slicing / grouping
    So essentially I have multiple fact tables and one dimension table (hierarchy)
    So the ‘one’ part of my relationship exists in the hierarchy table and the many part exists in the various sales and costs tables
    My plan was to create multiple relationships from my hierarchy table to the various sales and costs fact tables for the purpose of getting whatever data I need based off a single StoreID
    However, I’m conscious that this runs completely contrary to what is advised and am unsure if it will work when I try and create the relationships in PowerPivot NB The advice is to have one fact table and multiple dimension tables but I don’t have this arrangement
    Can you help me emerge through my brain fog?!
    Cheers
    Anthony

    • Anthony Newell

      I think I did it, I was getting too hung up with doing things the proper BI way rather than reality!
      All my various fact tables are now joined to my hierarchy dimension table. If I create a pivot table and use a page filter from the hierarchy table like area or division, the filter context travels back across the relationships to the various fact tables which is exactly what I want. I wasn’t sure it would work as the filter context is flowing from the one side to the many side of the relationship
      The impact of this for rationalising an existing process is going to be supersonic
      Throw away all of those unnecessary and long winded spreadsheets with SumIfs sprinkled around like confetti!!