Friday Challenge – Data Transformation

Data Transformation Formulas in Excel

This weekend starts the ModelOff competition.

“ModelOff is the world’s largest professional competition for Microsoft Excel, Financial Analysis, Investment Analysis and Financial Modelling.” -Modeloff Site

If you haven’t been to the site, you should check it out: ModelOff

So this Friday challenge is based on a 2014 question on Data Transformation.  I have experience using Excel for a large data migration from a client database to Salesforce.com, so I know that people all around the world use Excel for these types of activities.  This is a fun challenge that will definitely make you think.

 





The Data

You can download my sample file here:  Data-Transformation-Challenge-Sample-Records.xlsx

Here is what the data looks like:

Sample Data
 __11 AM   27-Mar-2016___4.970 kbps
  __1 AM 29-Mar-2016_4.429  kbps
   8   AM 10-Jun-2016_5.284kbps
__11   AM Wednesday 04-May-2016___5.540kbps
___1  PM   Monday 18-Jul-2016_6.454kbps
   _8  PM  7th-Apr-2016_2.785  kbps
 _6PM 8-Oct-2016_5.463 kbps
  __8 AM   Tuesday 16-Feb-2016___2.244  kbps
___2 AM Wed 28-Sep-2016_7.637   kbps
  _12  AM  07-Aug-2016_9.54 kbps
_1 PM 10-Nov-2016__4.795   kbps
  _4   AM   Wednesday 02-Mar-2016__8.180  kbps
_10  PM   Mon 25-Jul-2016___0.440   kbps
 ___3AM Thu 7th-Apr-2016___4.931  kbps
 __1 AM   Thu 28-Jul-2016___5.740   kbps
__2   PM  Wednesday 06-Jan-2016___8.960  kbps
   1 AM Sunday 22-May-2016__8.282  kbps
   10  AM   30-Mar-2016___6.649 kbps
  ___10  AM  Friday 23rd-Dec-2016__9.323  kbps
__8 AM  Tue 27-Dec-2016__2.764 kbps
__7AM   7-Mar-2016___1.136  kbps
 _8  AM 23-Jul-2016_8.157  kbps
  ___1AM   Thursday 13th-Oct-2016___6.430kbps
7AM Sun 09-Oct-2016_1.192kbps
   ___6AM   18-May-2016__3.582 kbps

It doesn’t seem too tough until you dig into the data.  There are varying counts of spaces and underscores, some of the dates are text and some are numbers (like 23rd-Feb-2016 and 23-Feb-2016).  So it will take some doing.

 



Challenge Questions

See if you can answer these 2 questions:

Question 1: What is the average KBPS for 2016?

Question 2: What is the average KBPS May?

 





Now this can be done with many helper columns, but see if you can get the answer with just 2 formulas:

a) One for the date of every row

b) One for the KPBS for every row

It took me several attempts, but it can be done.

 



Test Grading is Set as Follows:

Grade C = You get the correct answers with helper columns and not just 2 formulas

Grade B = You get the correct answers with NO helper columns and just 2 formulas (but you use more than one Excel IF function)

Grade A = You get the correct answers with NO helper columns and just 2 formulas (One for Date and One for KPBS) and only use 1 or less IF statements.

 





See if you can find the answers and test your Excel Data Transformation abilities!

 
 

Steve=True





9 COMMENTS

    • Hi Maruo, thanks for your formula. I will have to translate it to see if it works, but good job. I will post the answer on Monday.

  1. in italian
    =+MEDIA(+SE(VAL.ERRORE(+RICERCA(“2016”;+SOSTITUISCI(+STRINGA.ESTRAI(A2:A10001;1;+RICERCA(“_”;A2:A10001;8));”_”;””)));””;SOSTITUISCI(SOSTITUISCI(SOSTITUISCI(+SOSTITUISCI(+STRINGA.ESTRAI(A2:A10001;+RICERCA(“_”;A2:A10001;8);100);”_”;””);”kbps”;””);” “;””);”.”;”,”)-0))

    =+MEDIA(+SE(VAL.ERRORE((+RICERCA(“May”;+SOSTITUISCI(+STRINGA.ESTRAI(A2:A10001;1;+RICERCA(“_”;A2:A10001;8));”_”;””)))*(+RICERCA(“2016”;SOSTITUISCI(+STRINGA.ESTRAI(A2:A10001;1;+RICERCA(“_”;A2:A10001;8));”_”;””))));””;SOSTITUISCI(SOSTITUISCI(SOSTITUISCI(+SOSTITUISCI(+STRINGA.ESTRAI(A2:A10001;+RICERCA(“_”;A2:A10001;8);100);”_”;””);”kbps”;””);” “;””);”.”;”,”)-0))

    • Hi Mauro, I translated from Italian, but the formula didn’t work for me, but looks like you may have a correct answer. I have sent you the number answers in an email so that you can see if your formulas are working correctly.

  2. Hi Steve,
    Thanks for this kind of challenge, I really like them.
    Hereafter my answers (array formulas). You’ll notice that I use “;” between arguments and “,” instead of “.” as decimal separator.

    Question 1 =
    AVERAGE(NUMBERVALUE(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2:A10001;”.”;”,”);” “;””);”_”;””);FIND(“2016″;SUBSTITUTE(SUBSTITUTE(A2:A10001;” “;””);”_”;””))+4;LEN(SUBSTITUTE(SUBSTITUTE(A2:A10001;” “;””);”_”;””))-FIND(“2016″;SUBSTITUTE(SUBSTITUTE(A2:A10001;” “;””);”_”;””))-3-4))) = 5,0428673

    Question 2 = AVERAGE(IF(ISNUMBER(SEARCH(“may”;A2:A10001));NUMBERVALUE(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2:A10001;”.”;”,”);” “;””);”_”;””);FIND(“2016″;SUBSTITUTE(SUBSTITUTE(A2:A10001;” “;””);”_”;””))+4;LEN(SUBSTITUTE(SUBSTITUTE(A2:A10001;” “;””);”_”;””))-FIND(“2016″;SUBSTITUTE(SUBSTITUTE(A2:A10001;” “;””);”_”;””))-3-4)))) = 5,181021134594

    Enjoy your weekend
    Hocine

    • Hi Hocine, Great solutions. Your answers are correct and you get an “A”. There is a short formula for Question 1, but yours is great because it doesn’t need to be copied down the range and it also provides the final answer. So great job!

  3. Hi Steve,

    Man, those are some great formulas listed above. Mine not nearly so robust.

    Question 1: =AVERAGE(Table1_2[kbps])
    Question 2: =AVERAGE(IF(Table1_2[Month]=”May”,Table1_2[kbps]))

    That’s all there is to my formulas.

    Although it is not listed in your rules, I sort of cheated. 🙂 I used PowerQuery to cleanse my data prior to creating the formulas to answer your questions. It makes cleaning up messy data very easy.

    -Pete

    • Nice one Pete, I really need to learn more about Power Query. Maybe I can talk you into writing up how you did it for this one 🙂 Kinda cheating, but as with all good cheats, you have the ability to get an “A”. How do we know you got the right numbers? 🙂 Please provide proof 🙂

LEAVE A REPLY

Please enter your comment!
Please enter your name here