在Google Scholar搜索"Retrieval Augmented"就能get到一堆用IR技术辅助NLP任务的论文。实际上,所有知识密集型(knowledge-intensive)任务都可以考虑利用IR为模型提供额外知识,比如KILT基准包含的五大知识密集型任务:QA、Dialogue、Fack Checking、Slot Filling、Entity Linking:
QA:Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Dialogue:Retrieval Augmentation Reduces Hallucination in Conversation
Fack Checking:Improving Evidence Retrieval for Automated Explainable Fact-Checking
Slot Filling:Robust Retrieval Augmented Generation for Zero-shot Slot Filling
Entity Linking:Autoregressive Entity Retrieval
在Summarization、Machine Translation任务上也出现了类似的工作:
Summarization:Retrieval Augmented Code Generation and Summarization
Machine Translation:Nearest Neighbor Machine Translation
而在Language Model上,比较知名的就是谷歌的REALM: Retrieval-Augmented Language Model Pre-Training和Open AI的WebGPT: Browser-assisted question-answering with human feedback,这种重量级工作目前也只有大厂能做出来。
另外最近的一篇综述也对检索辅助生成任务的论文做了一个小结:A Survey on Retrieval-Augmented Text Generation。