The IEEE BigData 2025 Cup Challenge, titled "Suicide Risk Prediction on Social Media," is a significant competition held annually under the auspices of the 2025 IEEE International Conference on Big Data (https://conferences.cis.um.edu.mo/ieeebigdata2025/). This challenge will take place over several months, scheduled for December 8–11, 2025, in Macau SAR, China. Focusing on social media activity, participants are tasked with predicting the level of suicide risk associated with posts made by users. This challenge underscores the increasing importance of big data in the field of mental health, and the pivotal role that predictive analytics can play in supporting early intervention strategies.

The topic of this year's IEEE BigData competition is user-level suicide risk detection on social media. The dataset contains over 10,000 Reddit posts from various users. Each user is represented by their historical posting text content, and the task for the competition participants is to develop a predictive model that can accurately classify users into four suicide risk levels based on their posting history. Such a model could play a crucial role in identifying individuals at risk of suicide across their social media post sequence, enabling more comprehensive risk assessment and providing opportunities for timely intervention and support.

The top 3 teams based on model performance will receive cash prizes. Additionally, we will conduct a comprehensive evaluation based on both report quality and model performance to invite teams to submit their work for publication in the conference proceedings, and to present their findings at the conference.

Task Overview
Given a user's post sequence containing 5 historical posts, participants must predict the suicide risk level of the subsequent post from the label set {indicator, ideation, behavior, attempt}[1].

Dataset
Participants will be provided with:
• A training dataset of 7,383 labeled instances for model development
• A test set containing 1,283 unlabeled instances for model evaluation with results displayed on the leaderboard

Competition Phase – Real-time Evaluation
During the competition, participants submit prediction files and receive real-time leaderboard updates. The primary score reflects the best performance among all submissions on the public test set of 1,283 hidden-label samples, providing ongoing feedback on each team's highest-performing result.

Final Evaluation Phase
At the competition's end, participants submit their code and technical reports. We evaluate each solution's performance by running the submitted code on a new private dataset. The final score determines the ranking, with the top 3 teams receiving cash prizes.

Conference Invitation
We calculate a comprehensive ranking based on combined evaluation of report quality (50%) and final score (50%). Top-ranking teams will be invited to submit papers and present their work at the IEEE BigData Conference.

Evaluation Metrics
We use the weighted F1-score as our evaluation metric because it provides robust assessment, especially for imbalanced datasets. For report evaluation, we assess multiple criteria including the novelty and innovation of the proposed solution, clarity and quality of writing, and comprehensiveness of experimental analysis.

[1] Li, J., Chen, X., Lin, Z., Yang, K., Leong, H. V., Yu, N. X., & Li, Q. (2022). Suicide risk level prediction and suicide trigger detection: A benchmark dataset. HKIE Transactions Hong Kong Institution of Engineers, 29(4), 268–282.

Once you have accepted the Data Usage Agreement, please send us your team's information in the following format via email. We will respond to your inquiry and provide you with the dataset:

Team name.
List of Team Members with Affiliations.
Contact Person and Email Address.

We accept Competition Data Usage Agreement*

Data Usage Agreement

In consideration of the promises and mutual covenants contained in this Agreement, Recipient agree to the terms and conditions below.

Article 1. Data Set and Grant of License

1.1 The Dataset has been compiled by members of The Hong Kong Polytechnic University and comprises publicly available data from Reddit with the purpose of detecting users at suicide risk.

1.2 The Hong Kong Polytechnic University grants Recipient a non-exclusive license to use the Data Set solely for not-for-profit educational and/or research purposes. Uses of the Data Set include, but are not limited, to viewing parts or the whole of the Data Set; comparing data or content from the Data Set with data or content in other data sets; verifying research results with the Data Set; and extracting any part of the Data Set for use in Recipient publications or Recipient research in accordance with the terms of this Agreement.

Article 2. Recipient Representations

2.1 Recipient represents that it is not bound by any pre-existing legal obligations or other applicable law(s) that prevent Recipient from receiving or using the Data Set.

2.2 Recipient shall provide proper citation and acknowledgement to The Hong Kong Polytechnic University as the source of the Data Set in Recipient publications, presentations or other public dissemination of work utilizing the Data Set.

@article{li2022suicide,
title={Suicide risk level prediction and suicide trigger detection: A benchmark dataset},
author={Li, Jun and Chen, Xinhong and Lin, Zehang and Yang, Kaiqi and Leong, Hong Va and Yu, Nancy Xiaonan and Li, Qing},
journal={HKIE Transactions Hong Kong Institution of Engineers},
volume={29},
number={4},
pages={268--282},
year={2022},
publisher={Taylor \& Francis}
}

2.3 Recipient shall use Data Set for non-commercial, educational and/or research purposes only.

2.4 Recipient shall provide The Hong Kong Polytechnic University with immediate notice in writing of any breach of this Agreement, and if identification of any user in the Data Set becomes known to Recipient, Recipient shall also immediately use its reasonable best efforts to mitigate any harm or damage from such breach.

Article 3. Recipient Restrictions

3.1 Recipient shall not deduce or obtain information from the Data Set that results in Recipient or any third party(ies) directly or indirectly identifying any research subjects with or without the aid of other information acquired elsewhere.

3.2 Recipient shall not use the Data Set in any way prohibited by applicable local, state or federal laws.

3.3 Recipient shall not modify the Data Set, except as allowed hereunder.

3.4 Recipient shall not transfer any part of the Data Set to any third party without prior written consent from The Hong Kong Polytechnic University.

3.5 Recipient shall not make or use the Data Set for any commercial purpose.

Please submit the prediction file created by your team. Multiple submissions are permitted. The file format should be .xlsx, and the file name must be: YourTeamName.xlsx. The scores of the uploaded prediction results will be updated on the leaderboard the following day. For a detailed explanation of the content in your prediction file, please refer to the 'Task Description' section.

Top performance teams (evaluated based on model performance, approach innovation and report quality) will be invited to publish their papers on IEEE BigData conference.

Rank	Team Name	Primary Score (weighted-f1 score)	Final Score	Invited
Loading leaderboard data...

Note:

[1] Primary Score: During the competition, participants submit their prediction files and receive real-time updates on the leaderboard. The primary score reflects the best performance among all submissions on the public test set of 1,283 hidden-label samples, providing ongoing feedback on each team's highest-performing result.

[2] Final Score: At the end of the competition, participants submit their code and report. We evaluate the performance of each solution by running the submitted code on a new, private dataset. The final score is used for ranking, and the top 3 performing teams will receive prizes.

[3] Paper Invitation: We calculate a comprehensive ranking based on the combined evaluation of report quality (50%) and final score (50%). Top-ranking teams based on this combined assessment will be invited to submit papers and present their work at the IEEE Big Data Conference.

We use the weighted-f1 score as our evaluation metric because it provides a more robust evaluation metric, especially when dealing with imbalanced datasets. For report evaluation, we assess multiple criteria including the novelty and innovation of the proposed solution, the clarity and quality of report writing, and the comprehensiveness of the experimental analysis.

Submission Format:

Submit predictions as an XLSX file with:
One label per row: from {0, 1, 2, 3} which reflects indicator, ideation, behavior and attempt.
Column header: suicide_risk
Exact order matching the test set instances

Start of the competition, the task is revealed, Jul 15, 2025
Deadline for contest teams to submit email of intent, Sep 15, 2025
Deadline for submitting the source code & the detailed report of the solutions, End of the competition, Sep 30, 2025
Announcement of winning teams, Sending invitations for submitting papers for the special track at the lEEE BigData 2025 conference, Oct 15, 2025
Deadline for submitting invited papers, Oct 20, 2025
Notifcation of paper acceptance, Oct 30, 2025
Camera-ready of accepted papers due, Nov 15, 2025
The lEEE BigData 2024 conference, Macau SAR, China, Dec 8-11, 2025

Attractive cash prizes will be awarded to the top-performing teams.

1000 USD for the winning solution
500 USD for the 2nd place solution
250 USD for the 3rd place solution

Q: On the challenge website it says "Deadline for contest teams to submit letter of intent, June 10, 2025". I could, however, not find a draft/template for the letter of intent.

Q: I uploaded our team's predictions to the challenge website earlier today, but the leaderboard score hasn't been updated yet.

Q: May I know how many attempts each team has for predicting the test partition labels.

IEEE BigData 2025 Cup: Suicide Risk Prediction on Social Media

Data Usage Agreement

Note: