Blogmaster 3000


A journey along side me as I become a Data Scientist.

Module 5 Project: Classification

For my fifth project I analyzed categorical data that I found on Kaggle that is for detecting fraudulent credit card transactions. I chose this data because of the small percentage of transactions (0.06%) that would need to be properly classified as well as the volume of data provided. This scenario is one that I have yet to see and also a bit out of my wheelhouse as I have less experience in this industry than my previous projects. Naturally I needed to do additional research to get a better understanding of the landscape so that I am aware of potential pitfalls as well as industry standards. This project was very iterative and I can only imagine how much longer it would have taken without this research.


Module 4 Project: Importance of Industry Knowledge

I recently worked on my first project utilizing time series data. While I went into an in depth overview of my previous project, I was thinking of striking a new cord and talking about the overall process and what my key takeaways from this project are. I feel that what I learned from the process and the results of my approach will bring far more value than writing another article on how to model time series data. There are far more articles to read on the mathematics that are running the engine behind an ARIMA model than there are about how to make your life easier by approaching the process intelligently. I can imagine that it is a lot less sexier to say you need to have an intimate understanding of the industry that you are extrapolating data from than it is to talk about all the intricacies of a model. While I am not taking anything away from those articles and the obvious need to thoroughly understand each model that you are implementing in a project, I am a huge proponent of balance throughout life. I have seen time and time again how much farther you can get by incorporating balance, mostly because I had a tendency to lean hard in a certain direction when approaching most things in life. I am glad to see through this project, that has changed and is definitely paying off.


Module 3 Project: Statisctical Analysis (In Depth Walk Through)

I am sure that at least once in a data scientists career you will work with Microsoft’s Northwind Database in some aspect. For me, this was statistical analysis of the data in regards to four hypotheses. Only one hypothesis was given and the other three were left up to me, which I am sure is not going to be the case when employed as a data scientist. I thoroughly enjoyed the freedom of this project as I saw it as an opportunity to showcase both my ability to produce an answer for any project given, as well as show insights that may be going unnoticed. The latter is of higher value in my eyes as it not only shows an “above and beyond” attitude, it opens up opportunities for future projects that will hopefully involve my input.


Module 1 Project: What Stood out

Looking back on my first data science project, I am definitely able to pinpoint one aspect of the process that stood out to me. While the realization that I actually knew what I was doing and how I to approach this project is up there, the EDA aspect is what stands out the most. It was the the part of the process where I can say, with absolute certainty, that I nerded out!