Using Machine Learning techniques to predict if a startup will succeed.

Conor Reilly
3 min readMay 25, 2021

With startup funding reaching an all time high in 2021, here we are going to use machine learning techniques to predict if a startup will be successful based on several financial and market based attributes.

The technologies used through this project are Jupyter Notebook, Python and its libraries Pandas, Scikit Learn and Streamlit. The finished web app can be found here: https://startup-predictr.herokuapp.com/

  1. Cleaning the dataset.

The dataset used for this project is from Kaggle and based off of financial data provided by Crunchbase. The full dataset can be found here: https://www.kaggle.com/arindam235/startup-investments-crunchbase.

We will be basing our machine learning classifier off of the following attributes. Market, Funding Total (USD), Founded Country, and Founded Year. These attributes will be used to predict if a company is “closed” or “acquired”.

Cleaning of our dataset.

2. Predicting the status of a startup using machine learning.

For our prediction we are going to be using a random forest classifier provided by the scikit learn library. The random forest classifier consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction.

Random Forest Classification example.
Splitting data into testing and training data.
Implementing Random Forest Classifier.

Once we have split our data into training and testing data and implemented our random forest classification model. We can begin inputting test examples as a numpy array as seen below.

Lists

See more recommendations