Friday Challenge – Data Transformation Solution

In the latest Friday Challenge, you were tasked with creating a Data Transformation Formula for the data listed on this page: Friday Challenge Data Transformation

Data Transformation Butterfly Image
Data Transformation

Now it is a tricky challenge because the data is not uniform and you will have to parse out the data to find the answer.

This can be done with helper columns, but we wanted to see if we can break out the data into Dates and KBPS Values for each data row.  Were you able to do it?

Here is how I was able to get the answers.

First, we need to review the data.  Here is a sample.  What we will need to do for our solution in Excel is get each date out of the data as well as the KBPS value.

Sample Data
 __11 AM   27-Mar-2016___4.970 kbps
  __1 AM 29-Mar-2016_4.429  kbps
   8   AM 10-Jun-2016_5.284kbps
__11   AM Wednesday 04-May-2016___5.540kbps
___1  PM   Monday 18-Jul-2016_6.454kbps
   _8  PM  7th-Apr-2016_2.785  kbps
 _6PM 8-Oct-2016_5.463 kbps
  __8 AM   Tuesday 16-Feb-2016___2.244  kbps
___2 AM Wed 28-Sep-2016_7.637   kbps
  _12  AM  07-Aug-2016_9.54 kbps

 

NOTE: For any formulas that you copy from this site, you may need to replace the ” quotation marks with your system quotation marks as the website may convert my formulas into left and right Quotation marks and Excel doesn’t read these correctly.

 

Date Solution / Formula

Solving for standard dates

For the date, you can see that sometimes there are letters in the date like 7th-Apr-2016.  And the dates are not always in the same place as there are random numbers of spaces and underscores before the dates.

As I look at the data, it appears that the “-” dashes are only in the dates.  And the only random feature in the dates are 2 additional letter designations for the day of the week (i.e. 3rd- or 4th-). Therefore, we can use the dashes in the data to find the dates using the Excel MID function and we need 2 variations of it.  So let’s give it a shot.

If our data starts in cell A2 then our formula would start out like this:

=MID(A2,Find(“-“,A2)-2,11)

We are using the MID Function on the data in Cell A2 and then we are using the FIND Function return the position of the first “-” dash.  Then we move the position left or back 2 places in the text string.  Then from that final position, we return the next 11 characters.  For this data ” __11 AM   27-Mar-2016___4.970 kbps  ” this part of the formula will return this date text “27-Mar-2016”.

 

Solving for standard dates

The formula above works for all instances except when the date has text like “3rd” in it.  Now lets look at the formula to adjust it for that text variation.

=MID(A15,FIND(“-“,A15)-4,2)&MID(A15,FIND(“-“,A15),9)

Essentially, we are using the same functions, but we need to break it up into 2 MID Functions.

The first MID Function finds the first “-” dash and we will have 4 characters to the left of that dash that represents the day of the date.  To get the day, we need to look 4 characters left of the first “-” dash.  So we can get the day of the date with this part of the formula:   =MID(A15,FIND(“-“,A15)-4,2)

For this data ”   _8  PM  7th-Apr-2016_2.785  kbps” this part of the formula will return “7”.

To find the month and year of the date in our data, we need use this part of the formula “MID(A7,FIND(“-“,A7),9)”.

This MID Function finds the first “-” dash and we will always have 9 characters from that dash that represents the month and year of the date.

For this data ”   _8  PM  7th-Apr-2016_2.785  kbps” this part of the formula will return “-Apr-2016”.

We then combine/join the results of those two formulas with an “&”.

The resulting date for this data ”   _8  PM  7th-Apr-2016_2.785  kbps” this part of the formula will return “7-Apr-2016”.

 

Putting Date Formulas Together

I like to use IFERROR to check on which formula to use above for the dates for the 2 formulas listed above.  How it works is that it will check the first date formula and then if that returns an error like “#Value” then it will move on to the next date formula .

Now since this the formulas above only return text, it doesn’t let me know if we have an actual date or not.  In order for it to see if the text is really a date, you will need to wrap the 2 formulas in a VALUE Function like this:

VALUE(MID(A2,FIND(“-“,A2)-2,11))

– and –

VALUE(MID(A2,FIND(“-“,A2)-4,2)&MID(A2,FIND(“-“,A2),9))

Looking at the first VALUE formula above, on this data:

”   _8  PM  7th-Apr-2016_2.785  kbps”

it will return a text value of “th-Apr-2016” and when this is evaluated by the VALUE portion, then this formula will return “#VALUE!” error.

So this is how we will use the IFERROR Function.  If the first VALUE formula returns an Error, then go to the 2nd VALUE formula.  So our final Excel Transformation Date Solution Formula will look like this:

=IFERROR(VALUE(MID(A2,FIND(“-“,A2)-2,11)),VALUE(MID(A2,FIND(“-“,A2)-4,2)&MID(A2,FIND(“-“,A2),9)))

 

KBPS Solution / Formula

For the KBPS, you will see that the value s are not always 4 digits, but sometimes smaller (7.637 vs 9.54).  Also, the spaces are not always the same, so the text will be right next to the numbers.

As I look at the data, it appears that the KBPS data is between the year 2016 and the text KBPS.  So if we can find these 2 data points in the cell, then we create a new text string using the MID Function to just pull out that range.  After that, we can then use the SUBSTITUTE Function to remove the other text that is not part of the KBPS number.  So let’s give it a shot now.

If our data starts in cell A2 then our formula would start out like this:

=MID(A2,FIND(2016,A2)+4,FIND(“kbps”,A2)-FIND(2016,A2)-4)

First we are going to find the position text “2016” within the data in cell A2 with this part of the formula “FIND(2016,A2)+4”.  We will use this value plus 4 (so that we can ignore the text 2016 in our final position) as our starting position for the Excel MID Function.

Next you will locate the position of the text “kbps” within the data in cell A2 with this part of the formula “FIND(“kbps”,A2)-4″.  We will use this value and subtract the position number of the text “2016” so that we can find out how long of text to return to our final text string we want to use in our MID Function.

Using these values in a MID Function as you see above, for this data:

__11 AM   27-Mar-2016___4.970 kbps

the following value will be returned:  “___4.970 ”

This is pretty close.  The only things we need to do is clean up the text that is not part of our final number.  That would be the Underscore “_” and spaces.  To clean this up, we will use the Excel SUBSTITUTE and VALUE Functions to remove these items.

Therefore, you will want to wrap your MID Formula you see above in 2 SUBSTITUTE Functions removing underscore and kpbs text and then wrap that in a TRIM Function.  Finally, we want the resulting text to be transformed into a number so that we can use it to calculate the Average.  We will do this by wrapping it in a VALUE Function.  Here is what my final formula looks like.

=VALUE(SUBSTITUTE(MID(A2,FIND(2016,A2)+4,FIND(“kbps”,A2)-FIND(2016,A2)-4),”_”,””))

This will result in a value of 4.97 for the data in Cell A2.

 

Data Transformation Challenge Answers

You can copy both of these formulas all the way down your data to transform it for each row.  Then you can calculate your averages by just creating a quick Pivot table and group on Months.

Month Average of KPBS
Jan 5.123010883
Feb 5.00683494
Mar 4.954023095
Apr 5.199812739
May 5.181021135
Jun 5.062384798
Jul 4.932590024
Aug 4.980069767
Sep 4.988981906
Oct 5.100960957
Nov 4.952745198
Dec 5.03167052
Grand Total 5.0428673

Question 1: What is the average KBPS for 2016? = 5.0428673

Question 2: What is the average KBPS May? = 5.181021135

 

When you are presented with a problem like this, make sure you look at your data to see what is unique that you can use with other Excel Functions to transform the data the way you need it.

 

Video Demonstration

 

 

Sample File Download

You can download the sample solution file here:  Data-Transformation-Solution.xlsx

 

Thanks to everyone that sent in your solutions.  There are a lot of smart people out there, so make sure and check out their formulas as well in the comments of this and the previous post: Friday Challenge Data Transformation

 

 

Also, don’t forget to sign up for the email newsletter so that you will be sure to get the next post delivered directly in your inbox below.

Steve=True