Dan Costa

As a Data Analyst turned Data Scientist, my goal is to translate data into highly useful and understandable insights. I am experienced in Machine Learning, supervised and unsupervised algorithms and use data visualization techniques to present findings to both technical and non-technical audiences.

Areas of Interest

Here are some of the things that interest me...

  • Machine Learning

    Machine Learning:
    I have created both supervised and unsupervised models to make predictions and provide insights into the data. I am excited about and invested in strengthening my understanding of Machine Learning to best utilize the capabilities in real-life applications.
  • Deep Learning

    Deep Learning:
    Having built a convolutional neural network to assist with early wildfire detection through the use of image recognition, I am especially interested in further exploring artificial neural networks to extract progressively deeper levels of understanding from the data.
  • Google Cloud

    Google Cloud:
    I have utilized Google Cloud's Computing Platform to develop and host production level machine learning environments.
  • Python

    Python:
    I use Python to perform ETL, explore data using a number of Python libraries (Pandas, Numpy, Seaborn; to name a few) as well as in support of Machine Learning (Keras, scikit-learn, PyTorch, etc.).
  • SQL

    SQL:
    I really enjoy interacting with database structures, and extracting data for further analysis and modeling.

Recent Work

Image Recognition - Convolutional Neural Network

Over the past thirty years, the number of wildfires in the United States have decreased, yet the annual average of burned acreage has more than doubled. In an effort to assist with the growing concern of managing wildfires, I built a convultional neural network to assist with early wildfire detection through the use of image recognition.

Click here to view this project.

Social Impact Clustering Model

How well does the US economy bounce back from economic downfalls? How would you measure economic resiliency? In a group effort, our team of Data Scientists used unsupervised modeling to answer these types of questions by analyzing economic resiliency before, during, and after a crisis.


Click here to view this project.

Reddit Classification Model

How challenging is it to differentiate between the language of two similar online forums? With this project, I built a classification model (well, several actually) with the intent of differentiating between the two subreddits: r/backpacking & r/ultrarunning. Leveraging methods such as Natural Language Processing and Sentiment Analysis, I was able to produce some interesting insights!

Click here to view this project.

Predicting House Costs With Linear Regression

Which features of a house are the most valuable when listing a house for sale? How could you best position yourself to ensure you make the most money when listing your house? Fitting a number of different Linear Regression models on a dataset of housing features in Ames, Iowa, I attempt to answer these types of questions. The final model was then used in a Kaggle competition.

Click here to view this project.

Contact Me