Early Disease Prediction Tool
Projects | | Links: Biohack 2025's Winning Submission

In early 2025, I joined a team of friends for BioHack 2025, the first hackathon hosted by the Bioinformatics Club at the University of Calgary. Together, we built a disease prediction app in 24 hours driven by multiple machine learning models used to predict the likelihood of the user having certain diseases based on various input parameters. The app was built using Python and Flask and the front end was designed using HTML, CSS, and JavaScript, however my main area of contribution was in the machine learning backend. This was my first hackathon, and it ended up being an incredible learning experience capped off by our 1st-place finish and accompanying $400 prize.
In creating the models behind the prediction, 5 datasets were used; a cancer dataset, a stroke dataset, a diabetes dataset, a heart disease dataset, and a liver disease dataset. The use of 5 datasets was driven by the lack of a publicly available general disease dataset of sufficient quality, and as such multiple individual datasets and models were employed. I created a segment of code that iterated through many machine learning models such as random forests and logistic regression for each dataset, eventually selecting the most optimal via 10-fold cross validation. I also experimented with neural networks to perform the task, however they proved too volatile and prone to overtraining on the datasets. The models were then saved to .pkl files and integrated into the frontend. The complete submission is available on github.
A .gif showcasing the final application