I recently worked on my first project utilizing time series data. While I went into an in depth overview of my previous project, I was thinking of striking a new cord and talking about the overall process and what my key takeaways from this project are. I feel that what I learned from the process and the results of my approach will bring far more value than writing another article on how to model time series data. There are far more articles to read on the mathematics that are running the engine behind an ARIMA model than there are about how to make your life easier by approaching the process intelligently. I can imagine that it is a lot less sexier to say you need to have an intimate understanding of the industry that you are extrapolating data from than it is to talk about all the intricacies of a model. While I am not taking anything away from those articles and the obvious need to thoroughly understand each model that you are implementing in a project, I am a huge proponent of balance throughout life. I have seen time and time again how much farther you can get by incorporating balance, mostly because I had a tendency to lean hard in a certain direction when approaching most things in life. I am glad to see through this project, that has changed and is definitely paying off.
To give a background on the project, I was using Zillow’s economic data to answer the question, “What are the best zip-codes to invest in?”. As you can imagine, this is a vague question that can be interpreted in many different ways. I happen to have an affinity for real estate and have taken the time to gain a solid foundation of understanding of the industry. This turned out to be a huge blessing, but I initially I froze like a deer in the headlights.
Because I knew that the question asked of me had at least three possible answers, I had trouble making up my mind as to which route to take. Ultimately, I took a step back and, instead of trying all three, I analyzed the landscape of the industry given my studies as well as some recent personal experience. First off, I experienced first hand when I was searching for an investment property of my own, that the supply of homes is extremely low. This is evident in the the quantity and quality of offers being made on homes that are introduced to the market. In other words, when a house first goes up for sale, there are a lot of offers that are at asking price or above. This told me that the probability of finding a home that you can get for a “steal” and then proceed to renovate it to force the homes value up is not exactly the best strategy. You certainly can attempt this strategy but you are going to be searching for a long time and then run the risk of being outbid by all the other buyers who are also drooling at the deal you found. While this strategy is not the best option, the same indicators that tell us not to use this strategy also point us in the direction of a better strategy. As I am sure you are aware, when supply of an object is low, the price tends to go up. This is not the worst thing if you approach this situation from the right angle. My conclusion was that the best route to take is to buy and “hold” a property and then to sell it after the price has gone up.
This is a very common strategy in the industry and becomes extremely profitable the farther you get away from the recession’s trough. There are two major downfalls to this strategy, you run the risk of your home losing it’s value due to a market correction(recession) and you do not get to profit from this investment for years at a time. The trough of the recession was in 2012 so I was not comfortable projecting home values too far into the future given the first risk I mentioned. In response to this, I decided that I was only comfortable projecting home values 3 years into the future.
Now that I had a more concrete approach in looking for the “best zip-code” to invest in, I also had one problem that was glaring me in the face. The real estate “bubble” that caused the recession was filled with data that is obviously not indicative of the markets actual home values. I decided to build my model with data that starts at the bottom of the trough (2012). To further simplify my search of zip-codes I calculated the growth in each zip-code since 2016 and took the top 15 to run my model on. I am sure you can understand this approach intuitively, the zip-codes with the highest rate of positive change over the last two years are going to be the zip-codes that I want to target with my model.
I know I walked you through a lengthy thought process on my approach to building my model for this particular project. The reason for that was not to deviate from data science but to show how many decision I made before even writing a single line of code. All of these decisions proved extremely valuable and could not have been made without my knowledge of the industry. The reason I feel confident saying that the decisions were valuable is because in doing my due diligence, and altering the parameters of the model, its ability to predict future home values only declined as I moved further away from the initial model I created. I do not want to know how long it would have taken to stumble across this model trying all the different parameters of each model for each strategy.
Naturally, in most scenarios, I am not going to have the freedom to decide and define the goal of each project. While that may be true, I feel like all of these questions still have to be asked on each project, whether you are contemplating them yourself or speaking with the company you are consulting for. All of these questions help you gain a better understanding of what you are aiming to uncover within the project and ultimately will help you to stay on course and not waste valuable time. I know this blog is not as technical as my previous one, although I hope it provided just as much, if not more, value when working on your next project.