From Social Science to Data Science

Posted by David Braslow on March 4, 2019

Hello! Thanks for checking out my blog. A bit of context about me: I am currently working through the Online Self-Paced Data Science curriculum at Flatiron to prepare for taking on the role of Master Teacher of Data Science at the Flatiron School. So I’m not exactly your typical student! I have a fair bit of experience working with data, so this blog post will talk a bit about my past work as a social scientist, and how social science differs from data science.

While interviewing for the role at Flatiron, one of my interviewers asked me what I meant by the tagline on my LinkedIn profile: “Quantitative Social Scientist in Education.” I described myself this way because of my doctoral training at the Harvard Graduate School of Education. My work there mostly involves using quantitative methods to answer research questions relevant to education policy. For example, one study used factor analysis, a technique similar to PCA, to argue that mathematics instruction has both content-specific and general dimensions. I am still finishing up my dissertation, which looks at issues with the uses of scores from state tests.

My Personal Interests

I got interested in this work because I wanted to find ways to improve the experiences of students. I was lucky to receive an education that gave me a love of learning, a passion for mathematics, and the skills to succeed in college and beyond. However, I knew that many other students were not so fortunate, attending schools where they were disengaged and learned little. Given how much my schooling helped me, I wanted to find ways for schools to better help everyone.

To do this, I knew that I would need to change the large-scale systems that affect how public schools work. Given my love of math, I thought the best way to do this would be to learn how to work with large data sets that describe students, teachers, and schools. I would learn what everyone was doing: what students were learning, what teachers were teaching, and what administrators were trying to do to help. Then I would use those insights to better inform the people with the power to decide how to support them all.

Social Science Overview

At the Harvard Graduate School of Education, I learned how to do this work from the perspective of social science. There are a few key components to this work:

  • Choose a topic to pursue that aligns with your interests and goals
  • Learn about the research done by others about the topic
  • Formulate a specific question that will help advance the field
  • Collect data that can be analyzed to answer you question
  • Choose and implement methods for analyzing your data that will yeild an answer to your question
  • Write up your results, including the limitations, implications and next steps
  • Disseminate your findings to the field

In practice, this process is far from linear - for example, you may revise your research question based on the results of your analyses. However, this gives you the broad strokes of what social science looks like: there is a general field of knowledge, and we try to analyze data in ways that will add to that knowledge pool.

While there are plenty of arguments about what “data science” is, I believe most would agree that it differs from social science in each of these components. Some of these differences have to do with the fact that social science is typically conducted in academic contexts, while data science is typically conducted in business contexts - these are the contexts I describe below. However, I also believe that there are fundamentally different approaches to inquiry that underlie these two pursuits.

Choose a topic to pursue that aligns with your interests and goals

An organization hires data scientists to pursue the goals of the organization. While some organizations will give data scientists autonomy to do work that they think will best serve the company’s interest, they often hire data scientists with specific work in mind for them to do. A primary benefit of this is that your work has a greater chance of being used by your audience. In contrast, social science in an academic setting often provides more freedom to pursue topics of personal interest, but the chances of results being used to influence organizational practices is often lower.

Learn about the research done by others about the topic

Research can have very different meanings in academia and in business. Academic research is considered credible primarily if it passes peer review: a process wherein other experts in the field review a piece of research to critique it and ultimately judge whether it is worth publishing. Peer-reviewed research is then widely disseminated in print and at conferences, making it easily accessible. In business, companies frequently engage in research, both about their products and their competitors, but it is rare for companies to disseminate information about their findings to the world. This means that the opportunities to learn about research done by others can be more limited.

Formulate a specific question that will help advance the field

Research questions in social science often focus on learning about a general phenomenon: for example, do students who perform well on state tests in high school have a higher chance of succeeding in college? The answer to this question would be of interest to scholars and educators interested in students’ success after high school. In contrast, data science often attempts to model phenomena in order to better predict or influence them: for example, how well can we predict customers’ purchasing behaviors based on their interactions on a certain web page? The answer to this would be of use to the digital marketing or user experience staff of your organization, who could use your findings to inform modifications to the site.

Collect data that can be analyzed to answer you question

Data collection can be very labor intensive, and scientists of all stripes love to use pre-existing data whenever possible. However, data scientists often have ready access to massive troves of data collected by their organization, which they will use in their models. Companies can even purchase additional data from other companies that specialize in collecting it. In contrast, social scientists often will have to request data from various government or non-profit entities or collect their own.

Choose and implement methods for analyzing your data that will yeild an answer to your question

I have been fascinated to see which methods are shared between data scientists and quantitative social scientists: other than OLS regression, I haven’t found a lot of overlap. Social scientists typically prefer methods that provide good estimates of specific parameters in their models that relate to their theories. In contrast, data scientists care more about how well their models perform than about any specific parameter estimates.

Write up your results, including the limitations, implications and next steps

Social science puts a heavy emphasis on writing as a form of communication, with journal articles being the prototypical form. Presentations, such as at academic conferences, are distilled from this written work. In contrast, data science focuses more on visualization. A short presentation may be all that a data scientist gets to share - 99% of the work done to achieve the results will exist only in code or output logs.

Disseminate your findings to the field

Organizations that hire data scientists rarely expect them to share what they learn with others outside the organization - in some cases, the things they learn are even proprietary. Rather, data scientists typically present findings to others within the organization in order to influence business processes. That said, there may be ways to share your work to build your personal reputation outside of your organization, if that is of interest to you.

Conclusion

I have talked about the difference between social science and data science as if I am an expert in them both - I’m not! I am still a student in both domains. What I’ve written descrbies my current understanding, which I’m sure will evolve as I progress with in this course and this new role. That said, I am excited to wear the hat of data scientist more fully, as I see great potential for data science to have an impact on organizations in the education sector, and I am excited to learn the techniques used by data scientists that are rare among social scientists.