Open data

We have decided to open source some of our COVID-19 cough datasets.

We aim to benefit humanity and have decided to be pioneers in open sourcing by creating the first free COVID-19 cough dataset.

The COVID-19 pandemic a global crisis. It requires us all to work together to overcome this adversity. AI needs big data to perform and we are unaware of any publically-available COVID-19 cough datasets anywhere in the world. We want to step up and fill this gap. To this end, alongside our diagnostic algorithm, Virufy is creating an open source, free database of COVID-19 cough data. We believe that this will spur innovation and collaboration from academia and industry all over the world, helping to alleviate our current crisis.

Data source

At Virufy, we are committed to acquiring not just high quantity data, but also high quality data. Our dataset was collected at in hospital under supervision by physicians following Standard Operating Procedures (SOP). Our data is preprocessed and labeled with COVID-19 status (acquired from PCR testing), along with patient demographics (age, gender, medical history).

Get Started

Our data and instructions for usage can be found on our GitHub page along with our cough and textual models for processing this data.


We warmly welcome review of our data and code. We hope to create a community of researchers driven to use this data to create solutions for the pandemic. Please email us at if you are interested.