In order to explain my vision about data science, let me add “Data” to a famous quote of Richard P. Feynman:
Data “Science means, sometimes, a special method of finding things out. Sometimes it means the body of knowledge arising from the things found out. It may also mean the new things you can do when you have found something out, or the actual doing of new things."
Indeed Data Science embeds the 3 meanings where:
a special method of finding things out refers to the statistical analysis, machine learning methods with their math and probabiity theoretical background;
the things found out refers to the insights extracted from data;
the new things you can do refers to technologies and applications of data science.
Despite the fact that more enphasys is mainly put on methods and applications, I agree with Feynman that among the 3 aspects of data science “the things found out” is the most important. “This is the yield. This is the gold. This is the excitement, the pay you get for all the disciplined thinking and hard work. The work is not done for the sake of an application. It is done for the excitement of what is found out."(Richard P. Feynman)
Here below I provide some elements qualifying my approach to data science.
When I do data science, 3 principles guide me:
- adhere to strict research process using consolidated methods and reliable software
- respect the right for explanation throwing a light on the dark side of data science (black-box model, math involved)
- report results and methods honestly identifying and reporting limitations in validity of the study and enabling study reproducibility
I’m doing data science with R and its gorgeous ecosystem as main statistical computation tool working with tabular, time series, spatial, network and text data.
If execution speed is an issue, I make use of C++ and its linear algebra libraries Armadillo and Eigen interfacing R through Rcpp.
In order to communicate the insights learned from data, I issue reports, dashboards and blogposts making use of the literate programming rmarkdown ecosystem and I develop data web app prototypes with R Shiny web framework.
When data does not fit into my laptop RAM I set up my data lab in the cloud, in case of big data, on cloud clusters for using Spark. I make also use of machine learning cloud platforms for deep learning model training and deploy.
my always under construction showcase publishes:
- movielens recommender system project
- fall detection project both projects conducted for HarvardX professional certificate program
- Titanic Guess Game shiny app built for a short and introductory presentation on data science
- slotR a packaged shiny app for playing a simple game of chance inspired by R released under MIT licence
I published Computing Matrix Algebra on Leanpub.
The booklet is actually a cheat sheet about computing matrix algebra operations such as matrix multiplication, inversion and factorization.
It is written foR (aspiring) data scientists where with “foR” (capital letter R) I mean the side of data science addicted to R and its gorgeous ecosystem especially including Rcpp, RcppArmadillo and RcppEigen.
The following professional education certifications together with my former study in engineering (MSc at Politecnico di Milano University) and rigorous continuous learning build up my data science specialization:
- professional certificate in data science from HarvardX an online learning initiative of Harvard University through edX
- statement of accomplishment with distinction for completing the Statistical Learning course offered by Stanford online (not anymore on line)
- certificate of achievement for successfully completing and receiving a passing grade the STAT110x: Introduction to Probability course offered by HarvardX