Top Skills for a Data Scientist – 2019

Synopsis

We look at the top key skills required for a data scientist updated for 2019. We do so by gathering 2000 job posts and using text mining to retrieve information. We also use algorithms to find out how each skill is related to another. Finally, we look at implications of identifying the in-demand skills that will effect the workforce and economy. Read More “Top Skills for a Data Scientist – 2019”

Top Skills for a Data Scientist – 2018

Synopsis

We look at the top key skills required for a data scientist. We do so by gathering 1000 job posts and using text mining to retrieve information. We also use algorithms to find out how each skill is related to another. Finally, we look at implications of identifying the in-demand skills that will effect the workforce and economy.

Read More “Top Skills for a Data Scientist – 2018”

Open Data Day: Code and the City

I was a participant in a hackathon called Code and the City. The event was held in celebration of Open Data Day. Along with industry sponsors like Soti, Amazon, Microsoft and Cisco, the event sponsors included the City of Mississauga and Sheridan College.

codeandthecity

The idea was to answer a problem set that would benefit the City of Mississauga with a population of almost 800,000 using open data:

How can Mississauga gain greater awareness and engagement with the community in a digital environment?

Read More “Open Data Day: Code and the City”

Wearable Fitness Tracker Predictive Modeling

Synopsis

This report was created for a Canadian startup that builds wearable fitness trackers used in gyms and an accompanying mobile application. My solution yielded the best actual results among all report submissions from select individuals with highly qualified backgrounds. Some of the code has intentionally been removed.

Read More “Wearable Fitness Tracker Predictive Modeling”

Predictive Text Application – Milestones

Synopsis

We are developing an application that can predict a word based on previous ones. Text mining and natural language processing (NLP) are used. This is similar to the software available on mobile platforms such as SwiftKey. The end product will be a web application that takes an incomplete phrase from the user and predicts the next word. In order to build the application, we require an appropriate data collection. Here we use the English language sets from HC Corpora. This milestone report details our initial exploratory analysis of the data and our future goals in a concise and understandable manner.

Read More “Predictive Text Application – Milestones”

Human Activity Recognition and Machine Learning

Synopsis

Human Activity Recognition is emerging as a new field where wearable devices are commonly used to quantify the amount of time an activity is performed. In our analysis, we instead look at how well weight lifting exercises were performed in a study. Each individual in the experiment had various accelerometer data collected from devices on different parts of the body while performing barbell exercises in five different ways. We developed machine learning algorithms that predict the way they were performed based on accelerometer data. Our final model that gave us a 100% In Sample accuracy and a 99.0% Out of Sample accuracy was the random forest algorithm with a 10-fold cross-validation repeated 5 times.

Read More “Human Activity Recognition and Machine Learning”

Regression Models of MPG in Automobiles

Synopsis

In this report we are interested to know if automatic or manual transmission is better for MPG using the mtcars dataset and to quantify this result. The complication is that other variables also affect the MPG. In our best linear regression model, we see that weight and \(\frac{1}{4}\) mile time influence the MPG and therefore transmission alone cannot be used to determine the better MPG.

Read More “Regression Models of MPG in Automobiles”

Effect of Vitamin C Dose and Supplement Type on Tooth Growth

Synopsis

This is an analysis of the ToothGrowth dataset on guinea pigs available in the R standard installation. We first do a summary and exploratory analysis to see what the data includes. We then perform some statistical inference with confidence intervals and hypothesis testing to see which dose and supplement of vitamin C is more efficient in tooth growth. Assumptions are made to state our conclusions. We can state that orange juice is the better supplement for tooth growth in two of the three dosages. However for the highest dose, we cannot see any advantage of orange juice over ascorbic acid. In general, tooth growth increases with dose.

Read More “Effect of Vitamin C Dose and Supplement Type on Tooth Growth”