As the saying goes, good help is hard to find. When it comes to the
loosely defined qualities of a data scientist, locating and organizing expert
help among the dearth of candidates may seem closer to impossible.
A few experts in this emerging
field served up their perspectives on selecting data scientists and organizing
teams for the right approach on analysis of huge and varied data sets Wednesday
as part of the Chief Data Scientist Summit in Chicago held by BI event
organizers *IE.
Before you start looking to hire
a data scientist, you need to know what you’re looking for. Aron Clymer, data
scientist at salesforce.com, oversees a team of about a dozen data scientists
and business analysts for the SaaS vendor’s product lines that touch on about 1
billion behavioral data transactions per day.
Clymer suggested that to get an
idea of the size and scope of the team, start with an assessment of the three
types of “products” data science teams produce: ad hoc, periodic and real time.
Ad hoc queries are often on the lighter-effort end of the spectrum, where
periodic may take deeper digging and real time requires unique infrastructure.
From there, you can better gauge a data science team’s reach across enterprise
data and capabilities with adding data-backed insight on business questions.
And to formally organize your
data scientists and business data analysts, you’ll likely “settle” between the
two extremes of enterprise data science teams: the one-person “spanners” who
truly cover all aspects to business analytics, and the piecemeal,
multi-membered team approach.
The spanners like those harnessed
at Netflix may be nice in terms of project turnaround and elimination of team
friction, but they’re not realistic for most organizations to court, pay and
keep a single, all-encompassing data scientist of that skill level. On the
other end, a full data science team can provide a more well-rounded approach to
business questions and niche skill set, though risks the same challenging
issues associated with adding a tech team layer, as well as team member
obligations with other departments.
In describing his own team,
Clymer said a hub-and-spoke approach keeps data scientists aligned with the
specific concerns and interests with particular products. As data scientists
may typically work from data marts and off models, Clymer said it’s critical to
have a separate ETL and data warehouse people – “Data scientists typically
aren’t good at this, too.” – with as much automation included on those fronts
as possible.
Then there is the issue of
finding the right people for the job. Accretive Health Chief Data Scientist
Scott Nicholson discussed the very human elements he’s looking for in analytics
hires. Health care is a people-facing industry that requires transparency in
its functions, so an ideal data scientist must bring solid communications
skills and ample curiosity. Instead of someone who “jumps all over the tech,”
Nicholson, who has also worked in analytics for e-commerce and at LinkedIn,
said the best data scientists are people who ask: “How can I make a quick
impact?”
“The engineering stuff you can
pick up ... but the curiosity? That’s something that’s built in. I can teach
someone Python, but the curiosity is far harder to get.”
Nicholson adds that the right
candidate must be ready to follow a model from the first business questions
through development and definitely into deployment. This “end-to-end” quality
forces the data scientist to see the user’s predicaments. In addition, it
should enable better understanding to follow up with business questions that
root out more of the unknown patterns and problems lurking in the data.
George Mason University Professor
Kirk Borne, an astrophysicist and computational science professor at the
university, stressed the importance of getting clear communication on executive
expectations. Recounting a project in a previous position at NASA that spent
its first few months asking the “same question with different terms,” Borne
recommended data science teams that have a foot firmly planted in the business
operations of the enterprise. It’s a connection between the two sides of the
house that he’s currently looking into with the relationship between
computational and business school degrees at George Mason.
“Once you understand the business
question, you can prioritize your response and even come up with better
questions,” Borne said during a roundtable discussion at the event.
Justin Kern
No comments:
Post a Comment