January 14, 2012

Am I a data scientist?

I love LinkedIn.  LinkedIn gives me the power to explore the "long tail" of professional disciplines.  It's amazing how many different job titles and descriptions you can find on LinkedIn.  Here is an example of the titular diversity that I find when browsing my professional network:

  • Captain, Ayanox LLC
  • Motorcycle Mechanic
  • Chief Innovation Officer
  • Chief Scientist
  • Arborist Representative
  • Director of Product Management
  • Forensic Consultant
  • Display Coordinator - Anthropologie
  • Software Craftsman
  • Home Manager
  • Driver Service Provider - UPS
  • Super Hero

Data Scientist is a job title has created some buzz recently.  Buzz usually indicates that there is a need that is not being met, currently.  The Wall Street Journal's Marketwatch reported a month ago that only 1 out of 3 companies are making effective use of the data they have -- worldwide.  Consider how much data is being created automatically by the "internet of things", and you have an epidemic of ignorance.  Data science seeks to transform this ignorance into insight.

This post by Chris Taylor (3 hours old at time of authoring) suggests that the Data Scientist is the Career of the Future. Since this is the Systems-Illustrated blog, I must plug the solid EMC infographic that makes up the majority of the Chris' post. Kudos!

DJ Patil, the guy who coined the term "data scientist" while he was working at LinkedIn, has written a great (free!) little eBook on Building Data Science Teams that explains who a data scientist is and what a data scientist does for a business.  Great read!  

In a nutshell, a data scientist may:
  • Instrument the tools that collect and cleanse data
  • Investigate the data to find patterns and stories
  • Illustrate the stories so that they can be shared

According to Mr. Patil, a data scientist is characterized by:
  • Technical expertise: the best data scientists typically have deep expertise in some scientific discipline.
  • Curiosity: a desire to go beneath the surface and discover and distill a problem down into a very clear set of hypotheses that can be tested.
  • Storytelling: the ability to use data to tell a story and to be able to communicate it effectively.
  • Cleverness: the ability to look at a problem in different, creative ways.

My strengths are certainly weighted toward curiosity, storytelling, and cleverness.  I would consider my expertise in the systems engineering discipline to be deep, but not necessarily scientific.  That said, my entire professional career has been built upon real-world problem solving using a disciplined variation of the scientific method.  So, maybe I'm closer than I think.

Most people distinguish between a Business Intelligence analyst and a Data Scientist.  What am I?  How does one quantitatively assess this question and make a data-driven decision about whether to pursue the career of the future?
  • Education: 31% of data scientists have a Master's degree.  I have a Master of Science in Systems Engineering from the Johns Hopkins University.  Only 12% of Business Intelligence (BI) professionals have a post-baccalaureate degree.  Edge: Data Scientist. 
  • Major: 10% of data scientists studied Business as a major, compared with 37% of Business Intelligence (BI) professionals.  I have studied Business Administration (AA), Management Science & Statistics (BS), and Systems Engineering (MS).  Slight Edge: Data Scientist.
  • Comfort with Incomplete Data: Big Data-oriented Data Scientists feel comfortable working with incomplete datasets, and enjoy the challenge of cleansing and exploring data.  This totally resonates.  Recently, I have been very happily snorkeling through years of usage data for my client's data warehouse in order to understand the impact of a recent upgrade and how we can drive user adoption.  I am pushing the envelope for my team and have had to work through several defects in the data in order to get to the point where I believe the stories it is telling me.  Strong Edge: Data Scientist.
  • Involvement across the Decision & Data Lifecycle: 30-40% of data scientists have significant involvement in the entire process of acquiring data, parsing data, filtering data, mining data, applying algorithms to data, visualizing data, storytelling with data, dynamically interacting with data, and making business decisions based on data.  Again, my recent experience with user adoption data includes all of these.  I am very excited that we are driving forward in some truly mission-oriented directions with greater clarity and confidence bolstered by my analysis.  Edge: Data Scientist.

I think I could evolve a bit and successfully perform the job of a data scientist.  It would certainly be an exciting challenge.  That said, I'm not sure the name "data scientist" truly captures the essence of decision science.  Data science emphasizes the rigorous discipline and analytic techniques.  These are necessary.  

However, I feel that what I do is closer to medicine than science.  Medicine is the science and art of healing (Wikipedia, Medicine).  Medicine directly applies life science to real-world wellness challenges.  

I find myself most excited and fulfilled when I am able to observe an organizational risk or opportunity, diagnose the problem using insights from data that might have been ignored or overlooked, tell the story, and watch the organization change into something that is stronger and more healthy.  

Data Physician, Change Agent, or (my favorite) Opportunity Navigator seems to come a little closer to describing the role that I find myself playing.  As more organizations become aware of the role that their data plays in their health, I believe that we will see an increasing need for caring, creative, analytical professionals who can transform lifetimes of experience into moments of truth for our clients.  Bring on the data scientists!