Exploring whether vocal signals alone can be used to identify pain states You can go to the repo here.
This project investigates whether audio data can be used to classify whether an individual is experiencing pain. By analyzing vocal patterns in recorded speech and sounds, we explore the feasibility of building machine learning models that could support medical decision-making in low-information or remote settings, such as telephone-based healthcare services.
The primary goal of this project was to determine whether pain can be reliably classified from audio recordings using machine learning techniques. In particular, we aimed to understand how different modeling strategies perform under realistic constraints such as limited training data and variability across individual speakers.
We began by developing a custom neural network trained directly on audio-derived features. While this approach showed moderate success, model performance was limited by the small size of the available dataset.
To address this limitation, we adopted a transfer learning approach using embeddings from a large pre-trained audio–text model (CLAP). These embeddings provided richer acoustic representations, which were then fine-tuned for the pain classification task.
Model performance was evaluated under two data-splitting strategies:
Models evaluated using random splits achieved higher accuracy, indicating that CLAP embeddings capture meaningful signals related to pain. However, performance dropped significantly when evaluated on unseen speakers using person-level splits.
The performance gap between random and person-level splits suggests that speaker identity plays a substantial role in model predictions. Without a baseline understanding of an individual’s normal vocal characteristics, the model struggles to distinguish between voice-specific traits and changes associated with pain.
This highlights a key challenge in clinical audio modeling: generalization across speakers may require explicit normalization or personalized reference signals.
Despite these challenges, audio-based pain detection has promising applications in remote and telephone-based medical services, where visual or physiological measurements may be unavailable. Such a system could serve as an auxiliary diagnostic tool to help clinicians flag potential pain states and prioritize follow-up care.
Several extensions could significantly improve performance and clinical relevance:
This project is designed to be extensible. Future contributors could build on this work by adding new datasets, experimenting with alternative embeddings, deploying the model as an inference API, or integrating it into clinical decision-support workflows.