Data Privacy in Biomedical Research
- Science Holic
- Jul 1
- 3 min read
Author: James Xu
Editors: Serena Tsao, Ethan Tai
Artist: Coco Zhou

Recent advancements in artificial intelligence and analytical technology, combined with their increasing scale, have enabled their integration into a wide range of fields. One such area, biomedical research, is a scientific field in biology and human physiology that primarily aims to improve overall health by investigating diseases and their causes. To achieve this goal, the United States passed the Health Information Technology for Economic and Clinical Health Act (HITECH) in February of 2009, which facilitated the transition from paper-based systems to electronic health records (EHRs). This shift to EHRs has generated a surplus of data to be reviewed for biomedical research.
In the past, enormous datasets, or big data, have been far too complex and extensive for conventional use due to shortcomings in the computational speed and scalability of computers. Fortunately, the exponential growth in processing power has overcome these limitations, allowing researchers to accelerate these procedures. Furthermore, AI has been utilized to automate them, as seen in the use of researchers in Alzheimer's drug discovery. The National Library of Medicine reported that, with AI, researchers sifted through biomedical data nearly ten times faster than they would without it. This allows scientists to prioritize live gene and protein testing, where the most impactful discoveries are made.

However, concerns arise when considering the scale and sensitive nature of this information. Big data contains a lot of personal information about each person associated with both the data and metadata of each file. If this information were leaked to the public, patients might face serious consequences, including identity theft, discrimination, or loss of privacy. For example, the potential for genetic data to be leaked is apparent. The Harvard Gazette explains how the now-bankrupt company 23andMe experienced a breach in 2023, where user data from their ancestry testing was accessed by a hostile body. But also, because 23andMe filed for bankruptcy, the company's assets must be sold. As a result, user data will likely be sold alongside the company’s assets, which only further exposes it to leaks.
Due to these privacy concerns, ethical issues also appear for users. Specifically, many argue that consumers' consent and ownership rights should be reviewed. Historically, consent forms such as the Notice of Privacy Practices (NPP) have been filled with legal jargon and ambiguous language, making them overly complex. Ideally, an NPP would institute clear, accessible language that allows its patients to fully understand their rights before deciding whether to disclose their data. Even with informed consent, it remains unclear who this data belongs to: the institution or the patient. Due to these complications, medical institutions remain uncontested in accountability for proper use.

Fortunately, effective procedures exist to protect sensitive data for users. For instance, Privacy-Enhancing Technologies (PETs) serve as safeguards for data to be shared and analyzed. By utilizing algorithmic and encryption techniques, PETs can ensure the safe transfer of data. Additionally, the growth of PETs mirrors the recent exponential improvement of AI and technology, as the scalability and improved security of PETs ensure that biomedical data can be shared more effectively without worrying about possible leaks.
Ultimately, privacy is an immediate concern that needs to be addressed before more developments in biomedical research. While PETs mitigate leak concerns, there is still uneasiness about their usability and effectiveness. However, even with these concerns, we must find a balanced solution, as big data remains an integral part of biomedical research.
Citations:
Cheng, Feixiong, et al. “Artificial Intelligence and Open Science in Discovery of Disease-
Modifying Medicines for Alzheimer’s Disease.” Cell Reports. Medicine, U.S. National
Library of Medicine, 20 Feb. 2024, pmc.ncbi.nlm.nih.gov/articles/PMC10897520/.
Cho, H., Froelicher, D., Dokmai, N., Nandi, A., Sadhuka, S., Hong, M. M., & Berger, B.
(2024). Privacy-Enhancing Technologies in Biomedical Data Science. Annual Review of
Biomedical Data Science, 7(1), 317–343.
Ghofrani, A., & Taherdoost, H. (2025). Biomedical data analytics for better patient outcomes.
Drug Discovery Today, 30(2), 104280. https://doi.org/10.1016/j.drudis.2024.104280
Mineo, Liz. “What Happens to Your Data If 23andMe Collapses?” Harvard Gazette, 22 Mar.
2025, news.harvard.edu/gazette/story/2025/03/what-happens-to-your-genetic-data-if-
Wang, S., Bonomi, L., Dai, W., Chen, F., Cheung, C., Bloss, C. S., Cheng, S., & Jiang, X.
(2020). Big Data Privacy in Biomedical Research. IEEE Transactions on Big Data, 6(2),
Comments