Data science analysis of 130 mock interviews and resume reviews

I work with the good folks over at Career Cup and Evisors to provide technical mock interviews and resume reviews to CS grads interested in Google, Microsoft, Apple etc. Recently, I realized I had done over 250 interviews over a period of 5 years. Time to dig into the data with dangerously basic data science and see how I am actually doing.

Vote on Hacker News

NOTE I couldn't get data for all 250 interviews, only ~130. You can download the dataset and ipython notebook here.

Questions

During this blog post, I will try to answer these questions:

  1. Do my services provide good value for money?
  2. Why do some people think I don't provide good value for money?
  3. Has my value for money changed over time? Does it vary by time of day?

That's plenty for one post. I would like to dig into these two qustions in a future post:
* What can I learn from free-text comments? * Is my audience segmentable by race, ethnicity or sex?

Data Cleaning

The first thing you should do with a dataset is to make sure it's clean i.e. there are no outliers, data types are correct etc.

    df = pd.read_csv('/Users/bilal/src/data-explorations/careercup/feedback.csv', parse_dates=['otime'], dayfirst=True, index_col='otime')

    df.head()

    df.dtypes

    meetingid            int64
    expertName          object
    expid                int64
    industry            object
    skill               object
    adviceQuality        int64
    subjectExpertise     int64
    responsiveness       int64
    valueForMoney        int64
    comment             object
    dtype: object

    df.index.dtype

    dtype('O')

pandas is really good about parsing out data types. Unfortunately, our index column otime did got parsed as an object because we have dates in two different formats (hrrmm I wonder if evisors uses MySQL!) We'll use the excellent arrow library to parse datetimes.

We also see a few identity columns like meetingid that don't contain meaningful information, so we'll go ahead and drop those. This sort of data cleaning is really important before we start digging into data.

    def parse_datetime(d):
        import arrow
        from arrow.parser import ParserError

        # datetimes in the data occur in two formats:
        # 7/6/2014 0:12
        # 2014-12-07 19:00:00 GMT
        try:
            dt = arrow.get(d)
        except ParserError, e:
            try:
                dt = arrow.get(d, 'M/D/YYYY H:m')
            except ParserError, e:
                # some events have an otime of `t.b.a`
                return np.nan
        return dt.datetime

    df.index = df.index.map(parse_datetime)

    del df['meetingid']
    del df['expid']

    df.groupby(df.index.month).aggregate('count')['expertName'].plot(kind='bar')

png

This looks good. I can see some possible seasonality already, but let's dig into this later. Another important step is to call value_counts on all object dtypes. Let's go ahead and do that on expertName, industry and skill fields.

    df['expertName'].value_counts()

    Bilal Aslam                               104
    Bilal - In-Person (Seattle Only) Aslam      7
    dtype: int64

    df['industry'].value_counts()

    Computer Software                      26
    Information Technology and Services     7
    Internet                                2
    Program Development                     1
    dtype: int64

    df['skill'].value_counts()

    Mock interview                  39
    Interviewing                    31
    Resume critique                 20
    Software Development             7
    Resumes & Cover Letters          4
    Consulting & Case Interviews     3
    mock interview                   2
    Careers in Tech & Operations     2
    Career Conversation              2
    Programming Languages            1
    dtype: int64

The above chart is interesting. I offer three types of services:

  1. Mock interviews
  2. Resume reviews
  3. Mentoring

But there are more categories in the data than services offered. I'll have to do some cleanup. I'll create a categorical variable with 3 possible values. I also need to add another column - medium which can be phone or onsite. When expertId = 2047, the medium is phone, when it is 2055 then medium is onsite.

    def normalize_skill(skill):
        if skill in ['Mock interview', 'Interviewing', 'Software Development', 'mock interview', 'Programming Languages']:
            return 'Mock interview'
        elif skill in ['Resume critique', 'Resumes & Cover Letters']:
            return 'Resume review'
        elif skill in ['Career Conversation','Consulting & Case Interviews', 'Careers in Tech & Operations']:
            return 'Mentoring'
        else:
            return np.nan

    df['normalized_skill'] = df['skill'].map(normalize_skill)
    df['normalized_skill'].value_counts()


    Mock interview    80
    Resume review     24
    Mentoring          7
    dtype: int64

Data Exploration

Now that our dataset is clean, we start with the fun part - data exploration! Let's first look at histograms of several columns:

df[['adviceQuality','valueForMoney','subjectExpertise','responsiveness']].hist(normed=True)  

png

We see the distribution of values across types of ratings. One thing is clear - I consistently get high reviews (yay). Subject expertise is particularly high - this means interviewees believe I know what I am talking about.

Next, let's break down value for money:

sns.boxplot(df[['adviceQuality','subjectExpertise','valueForMoney', 'responsiveness']])  

png

    g = sns.FacetGrid(df, col="normalized_skill")
    g.map(plt.hist, "valueForMoney", normed=True);

png

That's interesting - if you look at value for money, people seem to be getting less bang for buck for resume reviews. We can tell that there is more value for money in my mock interviews as compared to mentoring or resume reviews.

    g = sns.PairGrid(df[['skill', 'adviceQuality','valueForMoney', 'responsiveness', 'subjectExpertise']])
    g.map_diag(plt.hist, normed=True)
    g.map_offdiag(plt.scatter)
    g.add_legend()

png

Next, let's break down skill ratings by type of service provided. We can immediately see that responsiveness is a problem, especially for resume reviews. This makes sense - unlike a mock interview which happens at a particular time, a resume can sit in my inbox for days before I look at it.

sns.heatmap(df.groupby('normalized_skill').aggregate('mean'))  

png

Let's look at resume reviews where my review was particularly low: less than or equal to 3. No clear pattern emerges here:

sns.heatmap(df.query('valueForMoney <= 3').groupby('normalized_skill').aggregate('mean'))  

png

Does value for money vary by day of week? Hmm, not really ...

df.groupby(df.index.map(lambda t: t.dayofweek)).aggregate('mean')['valueForMoney'].plot(kind='bar')  

png

How has value for money changed over time? Again, it's hard to derive lessons from this chart:

df.dropna().resample('M', how='mean', kind='period')['valueForMoney'].plot(kind='bar')  

png