Why?

The COVID-19 pandemic is extremely detrimental to society and we need to all work together to overcome it. AI needs big data to work and we are unaware of any open source COVID-19 cough datasets anywhere in the world. We want to benefit humanity and have decided to be pioneers in open sourcing the first free COVID-19 cough dataset. This will spur innovation and collaboration from academia and industry all over the world.

Data Source

Our data is very high quality because it was collected at a hospital under supervision by physicians following Standard Operating Procedures (SOP). Our data is preprocessed and labeled with COVID-19 status (acquired from PCR testing), along with patient demographics (age, gender, medical history).

Get Started

Our data and instructions for usage can be found on our GitHub page. Our cough and textual models for processing the data are also uploaded.

Collaboration

We warmly welcome review of our data and code. We hope to create a community of researchers driven to use this data to create solutions for the pandemic. Please email us at if you are interested.