Researchers at the University of Sao Paulo (USP) in Brazil said that preliminary findings from the model suggested the possibility of detecting the likelihood of a person developing depression based solely on their social media friends and followers.
The findings are published in the journal Language Resources and Evaluation.
While there are multiple studies involving natural language processing (NLP) focussed on depression, anxiety and bipolar disorder, most of these analysed English texts and did not match Brazilians’ profiles, the researchers said.
The first step in this study involved constructing a database, called SetembroBR, of information relating to a corpus of 47 million publicly posted Portuguese texts and the network of connections between 3,900 Twitter users. These users had reportedly been diagnosed with or treated for mental health problems before the survey. The tweets were collected during the COVID-19 pandemic.
“First, we collected timelines manually, analyzing tweets by some 19,000 users, equivalent to the population of a village or small town.
Discover the stories of your interest
“We then used two datasets, one for users who reported being diagnosed with a mental health problem and another selected at random for control purposes. We wanted to distinguish between people with depression and the general population,” said Ivandre Paraboni, last author of the article and a professor at USP. Because people with mental health problems tended to follow certain accounts such as discussion forums, influencers and celebrities who publicly acknowledge their depression, the study also collected tweets from friends and followers.
The second step, still in progress, has provided some preliminary findings, such as the possibility of detecting the likelihood of a person developing depression based solely on their social media friends and followers, without taking their own posts into account.
Following pre-processing of the corpus to maintain original texts by removing non-standard characters, the researchers deployed deep learning (AI), to create four text classifiers and word embeddings (context-dependent mathematical representations of relations between words) using models based on bidirectional encoder representations from transformers (BERT), a machine learning algorithm employed for NLP.
These models correspond to a neural network that learns contexts and meanings by monitoring sequential data relationships, such as words in a sentence. The training input consisted of a sample of 200 tweets selected at random from each user.
The researchers found that among the models, BERT performed best in terms of predicting depression and anxiety. They said that because the models analysed sequences of words and complete sentences, it was possible to observe that people with depression, for example, tended to write about subjects connected to themselves, using verbs and phrases in the first person, as well as topics such as death, crisis and psychology.
“The signs of depression that can be detected during a visit to the doctor aren’t necessarily the same as the ones that appear on social media,” Paraboni said.
“For example, use of the first-person singular pronouns I and me was very evident, and in psychology this is considered a classic sign of depression. We also observed frequent use of the heart emoji by depressive users.
“This is widely felt to be a symbol of affection and love, but maybe psychologists haven’t yet characterized it as such,” Paraboni said.
The researchers are now extending the database, refining their computational techniques and upgrading the models in order to see if they can produce a tool for future use in screening prospective sufferers from mental health problems and helping families and friends of young people at risk from depression and anxiety.