Most companies that plateau early in their data initiatives do so because of a lack of understanding of who to hire, when and what for.
Data supports decisions. Start from the top.
Of course, but what does this mean for hiring data people?
It means that the company needs a goal against which performance of decisions can be measured objectively. The lack of such a goal means there is no objective use for data, as data is used to take optimal decisions for reaching a goal. Such, you simply cannot succeed at reaching a goal if that goal does not exist (or is not clear/explicit).
It also means that you need to start with the decision makers. Hiring or training your management to use data for driving decisions rather than for justifying them is essential, because if the consumer of data is unwilling to consume, there is nothing external forces can do about it (you cannot change the unwilling).
On to the team, why do we need them?
Hiring a non-specific “data person” is as random as seeing a Doctor of Philosophy about your lower back pain. They might be able to help you accept the pain, but that wasn’t why you called.
Once you have clear goals and managers that know how to overcome personal bias, they will tell you what they try to achieve. Different problems can require vastly different skillsets, so in order to judge fit, be specific during the interview and test for ability to solve the particular problem. Are you looking for someone to maintain web scrapers, or are you looking for a pragmatic analyst with good algorithmic knowledge? Or perhaps a database expert? If you don’t know, listen to the candidates – they will tell you if you ask.
The first data hire.
Disclaimer: Nobody’s perfect, and you should adapt your hiring requirements to what the market has to offer and the talent you can access. That being said…
Thinking short term is the wrong tactic here. There is little to be gained from hiring someone who can only solve your right-now problem, because that candidate won’t know how to deliver something that can be taken further. Experienced data professionals can smell a mess through the job description, and will not even apply if you started off with the wrong tech, causing further progress delays.
It is key to start with someone experienced and capable. Technical challenges that can typically be overcome by an experienced data engineer in a month or two (such as building a data warehouse) can bog down companies for years until they change their approach.
“Everything should be as simple as it can be, but not simpler”– a scientist’s defense of art and knowledge. Attributed to Albert Einstein.
Key traits to look for:
- Honesty and self awareness. Overcoming personal bias is a requirement to taking data driven decisions consistently.
- Learner and teacher. Can this person learn something new for the interview? Can they teach you how to approach a problem you are facing? Do they show a history of learning?
- Pragmatism: A marker of seniority is knowing what caliber guns to use.
What to avoid:
- Confidence that is not backed up by learning. “When all you have is a hammer, the world is your nail”. They already know the best way, and incidentally it’s the only way they know.
- Fast talkers. There are many frauds applying for management positions that can convince a manager. Repeat the question if you have to. Poke their analytical knowledge and don’t be easily satisfied – they are not too senior to answer a simple methodological question such as “how do you choose sample size” respectfully. (anecdote: worst answer I heard was a firm “30”, from a guy that had on his profile over 5 years of data leadership and consulting in corporations. He got angry when I asked him why 30 and not 3000.).
Don’t look at the wrong things.
Remember, data driven decisions are taken on objective measures. Someone’s shyness or clothing style is not an objective measure of their job potential. I have witnessed an excellent candidate (young math phd applying as data scientist) get rejected for being shy. Don’t let the bigotry of your interviewer be a selection criteria. Value objective reason above subjective opinion. (be data driven…)
Another common misconception is the belief that everybody needs to like the person. B players will rarely accept an A player.
Transparency on process performance can also make a lot of people uncomfortable.
The data engineer
Base skills to look for: Databases, programming, grit, learning
A data engineer is someone who does not give up until the problem is solved.-grit, the common denominator of successful data engineers
Basically, data engineers typically have a couple of duties:
1. Keep stuff running
2. Make more stuff run
The keeping stuff running needs knowledge of some best practices and experience.
The making of more stuff run requires a good learning ability, because you are always interfacing with new and ever changing services.
Grit is also key because with the exception of the major vendors apis, API documentations are different from reality, so persisting until stuff works (regardless of existing docs, etc) is just necessary.
Who can fill the role?
With grit and learning, anyone can make it. But the hire-able candidate will likely need to have foundational understanding of databases and the concepts of programming. Specific languages are not required (but desired), because like anything else they can be learned.
A data engineer who is not a good learner will struggle.
There are some cases that require very specific qualifications (particular technologies, CI), but most will get by fine on python and sql.
If they are building the data warehouse, make sure they understand the basics of dimensional modelling. (the key word here is ‘business intelligence’, as the domain includes a paradigm about how to prepare data for consumption)
Different problems require different skill sets.
First, to define the most important skill, business understanding. In the context of data, this means
- Knowing the objectives of the business (like profit, growth, specific objective measures, or any specific limitations)
- Ability to map data point to real life events, understanding how the process is tracked in the data and how altering the process could impact the data.
- Ability to estimate effort and potential result of various initiatives and choosing according to added business value (pragmatism).
Sometimes called a business analyst. A reporting analyst is someone who produces reports ad-hoc. This is typically a transitional position for junior analysts, but some find their passion in it and specialise.
Reporting analysts must understand the business and be able to map a data point to a process. This is the absolute minimum starting point. Useful skills are sql, tool usage, and programming. Also useful is objective decisonmaking, but since they will not be making the decisions, it may backfire and hurt their ambition. This is often what triggers them to move into areas where they can take more ownership.
You will not be interviewing them for their current skill, but for their business understanding, interest and potential.
A data analyst can utilise a variety of tools and techniques to answer more complex questions and maybe run some simulations or predictions. They don’t necessarily have a formal quantitative background, but they are more experienced in generalising and finding simple answers to complex questions.
They typically want to make positive changes to processes in order to optimise things, and will provide the necessary analyses to back things up.
Necessary skills: business understanding, programming, analyst experience.
Nice to haves: Statistics
A data scientist possess knowledge of algorithms and they use it to bring about business improvements either through analyses, or through data products such as recommenders, similarity searches etc.
They should have little trouble programming, and often have an academic background.
There are many ‘wannabes’ in the applications, so make sure they actually possess the hard skills. But that’s just a pre-requisite.
The most important thing to look for is pragmatism. I like to use the following task as an example:
1. Download today’s temperature for our city from a weather api (eg, darksky.net)
2. How would you approach forecasting tomorrow’s temperature? Optionally implement and evaluate.
A pragmatic, simple answer is to download that too from the weather api, as the prediction is done by qualified meteorologists and easily available.
A pragmatic but less senior answer would be something like taking the same temperature as yesterday, as you will be right most of the times.
A less than pragmatic answer includes complex algorithms used to produce a prediction based on past data.
You obviously should not exclude someone on a single question, so get creative and come up with something perhaps more relevant to your business. You are looking to filter candidates in, so make sure that your testing approach allows for alternatives in case someone feels put on the spot.
But unless the role is on the bleeding edge of AI, or working on an already optimised algorithm, the data scientist will not need to know all the latest algorithms. Pareto principle, the basics will go a long way.
So who do you need to hire?
The difficulty about knowing how to build a data team is that you need to know ahead of time what can be done with data and what can be done within a company.
“What you see is all there is”– the reason you are still not using data (Daniel Kahneman, Thinking, fast and slow)
Knowing what can be done with data requires some degree of data experience.
Knowing what can be done in the company requires open communication with management.
Often, the company is not able to consume data at the rate that the data team produces.
This usually happens because the there is a gap of experience on the consumer side in how to transform insight into value.
This is something that can be remedied with training (data driven decision making workshop kind, not 30 minutes of tool usage).
If the consumer however is just not motivated to take action based on data, then their management needs to set better goals and expectations.
In both of these cases (lack of training or motivation), the change cannot come from the data team, because they cannot teach the unwilling.
Such, it is key to start with an experienced data person who has previously seen the applications of data, and work closely with them to implement changes in the organisation. Otherwise, they will be inefficient on their own.
With this experienced data person, and with knowledge of the current and future limits of your company’s ability to consume data, you are able to plan the data team roles as well.
Once you have the key problem areas identified, you can plan the roles that you need to fill for each case. Sometimes, it will be machine learning or engineering heavy, sometimes it will be more about teaching, mentorship and communication.
Do you need all data team members to drive organisational change?
In fact no, because they cannot drive organisational change. Organisational change is driven by management. The data team’s role is to support data driven organisational change.
If you need help with your team creation and candidate assessment process, do not hesitate to get in touch.