Home

Incorporated useful external information, such as syntax trees, into generative models, which have been important for event extraction tasks because of their flexibility and efficiency.
Proposed Prefix Generator, which encodes external information (e.g. Abstract Meaning Representation Graph, Optimus robust representation) by pre-training and mapping representations into prefixes in encoder-decoder models.
Our method proved to be generally applicable and especially effective in low-resource settings.
This work will submit to ACL 2023.

Analyzed spurious correlations caused by poor quality domain labels in current approaches to address subpopulation shifts between training and testing distributions.
Proposed integrating the candidate features in datasets’ metadata table to get higher-quality domain labels.
Built reinforcement learning framework that utilizes downstream task performance feedback of each metadata feature to optimize the domain labels iteratively.
Our method leads to improved worst-group performance in real-world datasets covering fields such as healthcare and weather forecasting.

Examined two popular relation extraction datasets (NYT10 and Wiki20) for annotation errors due to distant supervision methods adopted.
Proposed an improved relation ontology and adopted data-cleaning, constructed manually-annotated test sets for NYT10 and Wiki20, correcting 53% wrong labels in NYT10.
Analyzed performance differences of competitive models on manually-annotated and distantly supervised datasets.
Our conclusion sheds light on the importance of a more accurate evaluation for relation extraction research.
Paper Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction published in ACL Findings 2021.

Research Experience