There are many available summarization datasets, but none satisfy the conditions mentioned above. In order to address the absence of data that meets these conditions, the contribution of this paper is as follows: A new gold standard dataset, MEDIQA-Answer Summarization (MEDIQA-AnS) 6, consisting of 156 health questions asked by consumers, corresponding answers to these questions, and expert-created summaries of these answers. A dataset for this purpose must contain the following data: (1) questions asked by people without medical expertise (2) documents containing answers to the questions and (3) easily understood summaries that are informed by the health questions asked by consumers. Unfortunately, there is currently a lack of question-driven and consumer-focused data available, i.e., human generated summaries of information relevant to helping consumers answer their health questions. However, to develop more advanced summarization algorithms capable of reliably summarizing medical text, researchers require human curated datasets that can be used to consistently measure the quality of machine generated summaries. Recent developments in automatic text summarization, a field at the intersection of machine learning and natural language processing (NLP), have shown the potential to aid consumers in understanding health information 5. For this reason, a summary automatically generated in response to a user’s query could be extremely beneficial, especially for users who do not have medical expertise. While having a reliable, easy-to-understand summary for an article-such as one similar to the plain language summaries created by the health organization Cochrane 4-would likely make searching for health information easier, it is not possible to tailor a manually generated summary to every user. In fact, finding relevant biomedical material can be difficult for even medical experts 1. A conventional search engine will return a set of web pages in response to a user’s query, but without considerable medical knowledge the consumer is not always able to judge the correctness and relevance of the content 3. One of the first places people turn to for answers to their health questions is the internet 2. Summarization can be particularly useful for helping people easily understand online heath information. Even this article began with a summary: an abstract. Summaries are regularly used as a tool to quickly understand content from a single source, such as a book or movie, or from many disparate sources, such as news stories about a recent event. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries.Ī summary is a concise description that captures the salient details from a more complex source of information 1. The dataset’s unique structure allows it to be used for at least eight different types of summarization evaluations. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |