说明:收录25万 73个行业的国家标准 支持批量下载
信息抽取鲁棒性问题发现与提升 复旦大学自然语言处理实验室 桂韬 信息抽取相关任务及目标 任务目标 旨在自动地从海量非结构化文本中抽 取出关键信息从而有效地支撑下游任 务。命名实体识别及关系抽取是信息 抽取的两大主要子任务。 应用领域 l 知识图谱的构建与补齐; l 互联网信息检索; l 智能问答系统等。 2 信息抽取相关任务及目标 Ø 命名实体识别(NER)是信息抽取的子任务之一。其目的是从非结构化文本中抽取出人名、 地名、机构名等实体。 Ø 关系抽取(RE)是信息抽取的另一主要子任务。作为命名实体识别的承接,旨在进一步抽 取出实体间潜在的语义关联 Republican Trump replaces Obama as President of the United States. NER Republican Trump replaces Obama as President of the United States. 组织名 人名 人名 地名 RE (Trump, Member of, Republican), (Trump, President of, United States), (Obama, Former President, United States) 3 信息抽取相关任务及目标 实体识别的一般框架 POS n-gram syntax tree …… B-PER I-PER O …… B-PER I-PER O …… 特征提取 分类 • 深度学习能够自动学习到对模型有益的特征, 传统的机器学 习方法需要需要繁杂的特征工程, 而深度学习则不需要 4 信息抽取相关任务及目标 基于传统机器学习的命名实体识别 HMM MEMM CRF 5 信息抽取相关任务及目标 基于深度学习的命名实体识别 B-PER I-PER O …… Subword Chinese Word Cui Y, Che W, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing, EMNLP Findings. 2020: 657-668. 6 NLP 中的鲁棒性问题 深度学习容易走捷径 stars-- top right or bottom left moons -- top left or bottom right Geirhos, Robert, et al. "Shortcut learning in deep neural networks." Nature Machine Intelligence 2.11 (2020): 665-673 鲁棒性问题 7 信息抽取任务的鲁棒性问题 Entity Coverage Ratio (ECR) The measure entity coverage ratio is used to describe the degree to which entities in the test set have been seen in the training set with the same category. Fu et al. , Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study, AAAI 2020 8 信息抽取任务的鲁棒性问题 Entity Coverage Ratio (ECR) The measure entity coverage ratio is used to describe the degree to which entities in the test set have been seen in the training set with the same category. 如何全方位的检测模型的鲁棒性 Fu et al. , Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study, AAAI 2020 9 NLP 中的鲁棒性问题 Adversarial Attack Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013). 10 信息抽取任务的鲁棒性问题 Attack - BERT-based Adversarial Examples Human evaluation results Garg and Ramakrishnan, BAE: BERT-based Adversarial Examples for Text Classification, EMNLP 2020. 11 信息抽取任务的鲁棒性问题 5: Somewhat Agree 6: Agree Valid Perturbation A valid perturbation is a perturbation that receives a human score above some threshold Th. Valid Attack A valid attack is an attack consisting of valid perturbations only. Human Evaluation on Adversarial Example Hauser J, et al. BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification[J]. arXiv preprint arXiv:2109.07403, 2021. 12 信息抽取任务的鲁棒性问题 BERT is Robust? Hauser J, et al. BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification[J]. arXiv preprint arXiv:2109.07403, 2021. 13 信息抽取任务的鲁棒性问题 Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing 14 信息抽取任务的鲁棒性问题 Integrity Acceptability Analyzability TextFlint offers 20 general transformations, 60 task-specific transformations and thousands of their combinations, and provides over 67,000 evaluation results generated by the transformation on 24 classic datasets from 12 tasks, basically covers all aspects of text transformations to comprehensively evaluate the robustness of a model. Only when the new generated texts conforms to human language, can the robustness result obtained by the verification be credible. Transformation methods provided by TextFlint are scored in plausibility and grammaticality by human evaluation. The results of human and model evaluation can be found on this website. TextFlint can give a standard analysis report from the lexics, syntax, semantic levels. All evaluation results can be displayed with visualization and tabulation, so that users can accurately grasp the shortcomings of the model. More evaluation results and related analysis are in the paper. 15 信息抽取任务的鲁棒性问题 16 信息抽取任务的鲁棒性问题 Transformation - General Synonym “He loves NLP” is transformed into “He likes NLP” Spelling Error definitely à difinately Typos Shanghai à Shenghai EntTypos like à l1ke OCR Antonym John lives in Ireland à John doesn’t live in Ireland 17 信息抽取任务的鲁棒性问题 Transformation – Domain Specific NER: SwapNamedEnt “He was born in China” à “He was born in Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogog CWS: SwapVerb 看 à“看看,” “看一看,” “看了看,” and “看了一看.” POS: SwapMultiPOS “There is an apple on the desk” à “There is an imponderable on the desk” 18 信息抽取任务的鲁棒性问题 19 信息抽取任务

pdf文档 复旦大学 低资源信息抽取鲁棒性问题发现与提升

文档预览
中文文档 55 页 50 下载 1000 浏览 0 评论 0 收藏 3.0分
温馨提示:本文档共55页,可预览 3 页,如浏览全部内容或当前文档出现乱码,可开通会员下载原始文档
复旦大学 低资源信息抽取鲁棒性问题发现与提升 第 1 页 复旦大学 低资源信息抽取鲁棒性问题发现与提升 第 2 页 复旦大学 低资源信息抽取鲁棒性问题发现与提升 第 3 页
下载文档到电脑,方便使用
本文档由 路人甲 于 2022-08-13 07:09:09上传分享
站内资源均来自网友分享或网络收集整理,若无意中侵犯到您的权利,敬请联系我们微信(点击查看客服),我们将及时删除相关资源。