The IEEE BigData 2025 Cup Challenge, titled "Suicide Risk Prediction on Social Media," is a significant competition held annually under the auspices of the 2025 IEEE International Conference on Big Data (https://conferences.cis.um.edu.mo/ieeebigdata2025/). This challenge will take place over several months, scheduled for December 8–11, 2025, in Macau SAR, China. Focusing on social media activity, participants are tasked with predicting the level of suicide risk associated with posts made by users. This challenge underscores the increasing importance of big data in the field of mental health, and the pivotal role that predictive analytics can play in supporting early intervention strategies.
The topic of this year's IEEE BigData competition is user-level suicide risk
detection on social media. The dataset contains over 10,000 Reddit posts from various users.
Each user is represented by their historical posting text content, and the task for the competition
participants is to develop a predictive model that can accurately classify users into four suicide
risk levels based on their posting history. Such a model could play a crucial role in identifying
individuals at risk of suicide across their social media post sequence, enabling more
comprehensive risk assessment and providing opportunities for timely intervention and support.
The top 3 teams based on model performance will receive cash prizes. Additionally, we will conduct a comprehensive evaluation based on both report quality and model performance to invite teams to submit their work for publication in the conference proceedings, and to present their findings at the conference.
Task Overview
Given a user's post sequence containing 5 historical posts, participants must predict the suicide risk level of the subsequent post from the label set {indicator, ideation, behavior, attempt}[1].
Dataset
Participants will be provided with:
• A training dataset of 7,383 labeled instances for model development
• A test set containing 1,283 unlabeled instances for model evaluation with results displayed on the leaderboard
Competition Phase – Real-time Evaluation
During the competition, participants submit prediction files and receive real-time leaderboard updates. The primary score reflects the best performance among all submissions on the public test set of 1,283 hidden-label samples, providing ongoing feedback on each team's highest-performing result.
Final Evaluation Phase
At the competition's end, participants submit their code and technical reports. We evaluate each solution's performance by running the submitted code on a new private dataset. The final score determines the ranking, with the top 3 teams receiving cash prizes.
Conference Invitation
We calculate a comprehensive ranking based on combined evaluation of report quality (50%) and final score (50%). Top-ranking teams will be invited to submit papers and present their work at the IEEE BigData Conference.
Evaluation Metrics
We use the weighted F1-score as our evaluation metric because it provides robust assessment, especially for imbalanced datasets. For report evaluation, we assess multiple criteria including the novelty and innovation of the proposed solution, clarity and quality of writing, and comprehensiveness of experimental analysis.
Once you have accepted the Data Usage Agreement, please send us your team's information in the following format via email. We will respond to your inquiry and provide you with the dataset:
Please submit the prediction file created by your team. Multiple submissions are permitted. The file format should be .xlsx, and the file name must be: YourTeamName.xlsx. The scores of the uploaded prediction results will be updated on the leaderboard the following day. For a detailed explanation of the content in your prediction file, please refer to the 'Task Description' section.
Top performance teams (evaluated based on model performance, approach innovation and report quality) will be invited to publish their papers on IEEE BigData conference.
Rank | Team Name | Primary Score (weighted-f1 score) | Final Score | Invited |
---|---|---|---|---|
Loading leaderboard data... |
[1] Primary Score: During the competition, participants submit their prediction files and receive real-time updates on the leaderboard. The primary score reflects the best performance among all submissions on the public test set of 1,283 hidden-label samples, providing ongoing feedback on each team's highest-performing result.
[2] Final Score: At the end of the competition, participants submit their code and report. We evaluate the performance of each solution by running the submitted code on a new, private dataset. The final score is used for ranking, and the top 3 performing teams will receive prizes.
[3] Paper Invitation: We calculate a comprehensive ranking based on the combined evaluation of report quality (50%) and final score (50%). Top-ranking teams based on this combined assessment will be invited to submit papers and present their work at the IEEE Big Data Conference.
We use the weighted-f1 score as our evaluation metric because it provides a more robust evaluation metric, especially when dealing with imbalanced datasets. For report evaluation, we assess multiple criteria including the novelty and innovation of the proposed solution, the clarity and quality of report writing, and the comprehensiveness of the experimental analysis.
Submission Format:
suicide_risk
Attractive cash prizes will be awarded to the top-performing teams.