TY - JOUR
T1 - A Systematic Approach for Developing a Corpus of Patient Reported Adverse Drug Events: A Case Study for SSRI and SNRI Medications
AU - Zolnoori, Maryam
AU - Fung, Kin Wah
AU - Patrick, Timothy B
AU - Fontelo, Paul
AU - Kharrazi, Hadi
AU - Faiola, Anthony
AU - Wu, Yi Shuan Shirley
AU - Eldredge, Christina E
AU - Luo, Jake
AU - Conway, Mike
AU - Zhu, Jiaxi
AU - Park, Soo Kyung
AU - Xu, Kelly
AU - Moayyed, Hamideh
AU - Goudarzvand, Somaieh
PY - 2019/2/1
Y1 - 2019/2/1
N2 - "Psychiatric Treatment Adverse Reactions" (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients' expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients' narratives data, by linking the patients' expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].
AB - "Psychiatric Treatment Adverse Reactions" (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients' expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients' narratives data, by linking the patients' expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].
KW - Annotated corpus
KW - Adverse drug events
KW - Drug effectiveness
KW - Online healthcare forums
KW - Patients narratives
KW - Psychiatric medications
KW - SSRIs
KW - SNRIs
KW - Drug safety
KW - Social media Information extraction
KW - Semantic mapping
KW - SNOMED CT
KW - UMLS
KW - Text mining
KW - Machine learning
UR - https://digitalcommons.usf.edu/si_facpub/435
UR - https://doi.org/10.1016/j.jbi.2018.12.005
U2 - 10.1016/j.jbi.2018.12.005
DO - 10.1016/j.jbi.2018.12.005
M3 - Article
VL - 90
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -