No, I’m not a data scientist now. But I do have a wider understanding of data science and a few new skills thanks to coursera.org’s Data Science Specialization. This summer I completed the first four classes: The Data Scientist’s Toolbox, R Programming, Getting and Cleaning Data, and Exploratory Data Analysis.

Motivation for taking these courses came from several places. Earlier this year, someone suggested to me that these courses would make a good model for a library’s data management education program. I’ve also been told by multiple people that I should learn how to program in R.

There are nine classes in the Data Science Specialization and one capstone project created and maintained by faculty at Johns Hopkins University. You can audit the courses for free as many times as you wish, but if you want certificate type proof of course completion and to take the capstone, you have to pay $49 dollars a course.

Here are my thoughts on the specialization, which are not too different from other reviews I’ve read by people with different backgrounds and perspectives than me.


What’s it like taking a MOOC (massive open online course)?

There are thousands of students all over the globe enrolled in the MOOC with you. Perhaps because of this, forum participation is high and often lively. Instructors take on more of a moderator role than an instructor role. You complete lectures and assignments at an asynchronous pace. Assignments are graded automatically by the software or through peer-review. Peer-review was my least favorite thing about the experience. Some students clearly clicked through assignments to get participation credit and others were hell-bent on catching cheaters. The latter I discovered through the many lively forum discussions on the topic.


Should you model a library’s data management workshops on this specialization?

I lean towards no for two reasons, one a criticism of the program and one a compliment. First, while it advertises as such and the first course is, this specialization is not for beginners. It contains high level concepts and assignments condensed in a very short period of time. You don’t get the opportunity to understand a new concept because after all the time you must spend googling help for the current assignment, you already have another assignment using a different dataset with a new concept. Programming experience, linear algebra, and statistics are background musts to complete. For library data management education programs, I feel it would be better to focus on basics and reference resources with a librarian proficient in R who can help direct students individually in their projects.

While this is not a beginners specialization, the content is great. The courses would complement graduate school work and could guide new researchers on the next step of their current data problem. Don’t reinvent the wheel. It’s free and already out there.


What did I get out of the courses?

I got a good high level overview on how to set out to solve a data problem. I learned what R is capable of doing and what it should be used for, along with basic R syntax. Users get introduced to git and github, something that wasn’t new to me but I was glad to see it in the program.


Who should take these courses?

  • Someone with a current data project that doesn’t know where to start or what to do next
  • Students working in data science
  • Someone in a similar field switching to data science research
  • Academic librarians (but just the first class The Data Scientist’s Toolbox)


What can you reasonably expect to gain from taking these courses?

Not a job. Reading the forum posts, I feel that some of the students had unreasonable expectations for the specialization. It really is just a good professional development opportunity to demonstrate career growth and add to other resume items such as graduate education in scientific research, statistics, or math, and professional experience.


Final Thoughts

This isn’t a specialization for a new set of skills. This is dense, high-learning-curve graduate work. You can get by, but you won’t really get deep understanding unless this is complemented with graduate work or real work experience. However, the classes are a good reference for setting up a research project and provide great examples. And the classes might go better if you took your time, instead of trying to learn R programming in a month. After all, they are free to take over and over again.