Perfect datasets don’t exist but cool tools do. Here are two I use.

One component of my day-to-day is retrieving reports and turning them into something useful. Simple, right? Ha! I bet you laughed, too. Data comes in all shapes and sizes…in fact, it usually comes in every form besides the one we need. As data scientists, our job isn’t only to interpret the data and provide someone with the results to make an informed decisions – although that’s great. Visualizations? Those are great, too. But, if you don’t start with the right data, you might as well not analyze or visualize it. …so what?

Well, canned reports suck (unless they’re super customized by your IT team, but who has time for that?!?). When I ask for a data pull, I hate being told “no” or “yes, I’ll get that to you in a few weeks.” Usually when I need a report, it’s because someone important needs to make a decision. So instead of waiting around for someone to do it for me, I put on my data manipulation cap and get ready to finesse some Excel files.

Enough with the back story, right? You need tools! Admittedly, my most common problem is having all the data I need…spread across a bunch of out-of-the-box reports. Easy you say? Sure, if I was only dealing with a few rows of data. We’re talking thousands of rows. So, here are a few tools I’ve come across that have turned my data nightmare into something usable:

  1. Consolidate
  2. Merge Table Wizard for Microsoft Excel (the app with the world’s longest name) – Mac
    1. If you’re a PC user, use ActiveData

Screen Shot 2017-04-17 at 8.35.21 PM

So let’s talk about these tools…

Consolidate is great when you have several entries for one company (or participant) that you need summed, averaged, or a number of other functions. It’s as easy as five steps:

  1. Select your data.
  2. Go to the Data tab.
  3. Choose the Consolidate function.Screen Shot 2017-04-17 at 9.07.41 PM.png
  4. Select your data (be sure to note where the label that you’re consolidating is at).Screen Shot 2017-04-17 at 8.37.35 PM
  5. Choose your function.

Screen Shot 2017-04-17 at 8.38.22 PM

Done! Now you’ve got your data in one place. Screen Shot 2017-04-17 at 9.14.02 PM.pngIn my case, I needed each company’s entry to be consolidated into one entry, and I wanted the sum of all of their orders.
Next, I wanted to tie e-mail addresses to the company. That’s where my fancy application came into play. Here’s a quick walk-through on merging two reports to make one useful report.

  1. Open both reports.
  2. Open Merge Tables Wizard (Mac)
    1. ActiveData (PC)Screen Shot 2017-04-17 at 8.47.35 PM
  3. Select the data you want to merge on (pick the dataset with the majority of the data). This is your primary dataset.
  4. From the other worksheet (or, in my case, a workbook), select the data you want to merge. Screen Shot 2017-04-17 at 8.48.36 PM
  5. Your tool will likely verify that it identified the variable you want to merge on (i.e. company).Screen Shot 2017-04-17 at 8.48.46 PM
  6. The default is to add a new column for any data that does not exist in the primary.
  7. After that, the remaining steps are fluff. Select what you want and run it!

Boom. Screen Shot 2017-04-17 at 8.49.34 PM.pngIn several minutes, you just created a data file with all the data you needed. Of course, this is an oversimplification. But, you can apply it to anything – especially those nasty datasets.

So, what about you? What are your favorite tools for forcing Excel to comply?Screen Shot 2017-04-17 at 9.14.02 PM.png

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s