- Past Projects -
Abstract
Online discussion forums are social cyberspace channels that allow for the distribution of user-generated content and peer-to-peer discourse surrounding a shared, specified topic. While these forums contain large amounts of valuable information on a variety of topics, content ranking algorithms on these sites are often flawed, resulting in sub-optimal filtering of content that pushes low-quality submissions to the forefront. Current ranking systems place too much emphasis on ’easy-to-measure’ metrics that do not reflect the true quality of a content submission.
Click to View Full Report
Introduction
In this project, I seek to analyze the college and NFL statistics of recently drafted players to explore the relationship between college and professional success. The data is pulled from two different sources: the NFL csv was found on kaggle, while the college data is from a dropbox of csv’s - found on Reddit’s r/CFBAnalysis - containing single game statistics for all players from 2005 to 2013. For this reason, I limit my analysis to that decade. Furthermore, since football positions have wildly different statistics for measuring performance, I further limit the scope to a single position: Running Back. As a huge Detroit Lions fan, this position is of particular interest, as many fans want the team to draft a running back in this year’s NFL draft (2018).
Research Questions and Methodology:
- Does having good college stats imply a higher likelihood of success in the NFL? Can this relationship be used to predict future NFL success for college players?
- Does this relationship differ for players who played against 'top' competition vs. second-tier and third-tier competition?
- Do better stats always lead to better draft positions?
- Methods: Correlation, Regression, Classification, Dimensionality Reduction (PCA), Clustering
- What college statistics are better predictors of making a successful transition to the NFL?
- Pivot: What college statistics are the most important predictors of whether a player will be drafted to the NFL?
- Methods: Classification, Logistic Regression
Click to View Full Report
Introduction
In this project, I set out to explore the data behind one of my biggest interests: Music. As someone who listens to “ a stupid amount of music” (direct quote from a friend after Spotify’s 2017 year-end account statistics were released), I knew that this would be an area that was both compelling for me to explore, and allowed me the freedom to pursue meaningful and interesting questions. In particular, I wanted to focus on live music, as I have recently begun to attend (and plan to continue attending) a large number of live shows. In the context of this class, I felt that I could formulate a strong project by studying the intersection between concert sales and overall music sales.
Research Questions and Hypotheses:
- What is the relationship between chart success and tour success? Are top charting artists more likely to have financially successful tours?
- Hypothesis: Top charting artists will consistently have more financially successful tours than low charting artists.
- Do certain genres tend to have more successful tours? Which genres are the most/least expensive for fans to attend?
- Hypothesis: Pop tours are the most successful and most expensive, followed by Rap/Hip-Hop.
- Based on my hypothesis that top charting artists will have the most successful tours: Are tickets for top charting artists’ shows consistently more expensive than average? Are low charting artist’s tickets consistently cheaper?
- Hypothesis: Tickets for shows featuring top charting artists will consistently be more expensive than low charting artists.
Click to View Full Report
Introduction
My motivation for this project mainly centered around a desire to better understand how neural networks work and ways they can be applied to real world business situations. Additionally, after my experience working on my Reddit discussion scoring project returned less-than-desireable results, I wanted to explore some more powerful algorithms and tools for analyzing the quality of user/consumer-produced content. After some investigation, I identified a quality data set of Amazon Reviews which I felt would be interesting to perform sentiment analysis on. Using a train/test split of 90% to 10% (as well as a validation split of 10% within the test data), I tested a several Neural Network architectures, adjusting parameters based on results to tune the algorithm. In this way, I was able to achieve an accuracy of 93.96% and a loss of 0.1615 from my final algorithm (null accuracy of 50% for the test data).
Click to View GitHub Page
Welcome!
Although I do not consider myself to be a web developer, I thought it would be fun to learn a little about how it works. This website is the end result! Check out the code for this website - and all my other projects - on my GitHub (linked below and in the footer of every page).