Open Dataset

MSParS (V1.0)：知識ベースの質疑応答に用いる多視点意味解析データセット

4.94M

627 hits

0 likes

0 downloads

0 discuss

MNIST Classification

MSParSはオープンドメインの意味解析タスクに使用される大規模なデータセットです。データセット全体は81,826個のサンプルで構成されており、これらのサンプルは英語を母語とする人々によってアノテーションされています。私たちは......

Introduction
Data file
Related papers
Code
Discuss(0)
Instructions

Data Structure ? 4.94M

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

MSParSは、オープンドメインの意味解析タスクに使用される大規模なデータセットです。データセット全体は81,826個のサンプルから構成され、これらのサンプルは英語を母語とする人によってアノテーションされています。私たちはこれらのサンプルをランダムにシャッフルし、そのうちの80%（63,826）を訓練セット、10%を検証セット（9,000）、残りの10%をテストセット（9,000）として使用します。テストセットについては、このデータセットが現在オープンな評価をサポートしているため、アノテーションされていない質問のみを公開しています。

各サンプルは、以下を含む四要素の組です：

1、質問（またはマルチラウンドのQAシナリオの複数の質問）

2、質問の意味を表す論理形式

3、質問から抽出されたパラメータ（エンティティ/タイプ/値）

4、質問タイプ

ほとんどの既存の意味解析データセットは、サイズが制限されているか、単一関係の質問に偏っていますが、MSParSは9種類の単一ラウンドの質問（単一関係、マルチホップ、マルチ制約、最上級、集約、比較、yesno、cvt、およびマルチ選択）と3種類のマルチラウンドの質問（マルチラウンドエンティティ、マルチラウンド述語、およびマルチラウンド回答）をカバーしています。質問タイプの合計は12です。

MSParSは、Microsoftのオープンドメイン知識グラフであるSatoriに基づいてアノテーションされています。MSParSのエンティティ、述語、およびタイプは、Satoriの標準形式に従います。対応するSatoriの断片は後ほど公開されます。

将来的には、このデータセットをV2.0に拡張し、クリアランス質問などのより多くのタイプの質問を含め、敵対的サンプルを追加し、クロス言語意味解析タスクのために非英語の質問をアノテーションするなどの作業を行います。

形式と例

<question id=409> 最新のリリース日を持つコンピュータゲーム
<logical form id=409> ( argmax ( lambda ?x ( isa ?x mso:cvg.computer_videogame ) ) ( lambda ?x lambda ?y ( mso:cvg.computer_videogame.release_date ?x ?y ) ) 1 )
<parameters id=409> mso:cvg.computer_videogame (type) [0,1] ||| 1 (value) [-1,-1]
<question type id=409> 最上級
==================================================
<question id=8995> プロジェクトマーキュリーに関する創作作品の名前は何ですか？ ||| この宇宙飛行はいつ最初に開始されましたか？
<logical form id=8995> ( lambda ?x ( mso:media_common.subject.creative_work project_mercury ?x ) ) ||| ( lambda ?x ( mso:spaceflight.space_program.started project_mercury ?x ) )
<parameters id=8995> project_mercury (entity) [7,8] @Q1 ||| project_mercury (entity) [7,8] @Q1
<question type id=8995> マルチラウンドエンティティ
==================================================

引用

あなたの研究でMSParSデータセットを使用する場合は、以下を引用してください（===== 論文は後で更新されます =====）：

@inproceedings{
  title={Overview of the NLPCC 2019 Shared Task: Open Domain Semantic Parsing},
  author={Nan Duan},
  booktitle={NLPCC},
  year={2019}
}

NLPCC 2019

MSParSデータセットは、NLPCC 2019の共有タスク2（オープンドメイン意味解析）をサポートしています。

CCF国際自然言語処理と中国語コンピューティング会議（NLPCC）は、CCF TCCI（中国コンピュータ学会中国情報技術委員会）の年次会議です。NLPCC会議は、北京（2012年）、重慶（2013年）、深圳（2014年）、南昌（2015年）、昆明（2016年）、大連（2017年）、および呼和浩特（2018年）で成功裡に開催されています。今年のNLPCC会議は、10月9日から14日まで敦煌で開催されます。

重要な日程：

2019/03/15：共有タスクの発表と参加募集；
2019/04/01：詳細なタスクガイドラインの公開と訓練データの公開；
2019/05/15：テストデータの公開；
2019/05/20：参加者の結果提出締切；
2019/05/30：評価結果の公開とシステムレポートおよび会議論文の募集；
2019/06/30：会議論文の提出締切（共有タスクのみ）；
2019/07/30：会議論文の受理/不受理通知；
2019/08/10：校正済み論文の提出締切；
2019/10/12～14：NLPCC 2019本会議；

No content available at the moment

Share your thoughts

Go share your ideas~~

ALL

Welcome to exchange and share

Your sharing can help others better utilize data.

Data usage instructions:

I. Data Source and Display Explanation:

1. The data originates from internet data collection or provided by service providers, and this platform offers users the ability to view and browse datasets.

2. This platform serves only as a basic information display for datasets, including but not limited to image, text, video, and audio file types.

3. Basic dataset information comes from the original data source or the information provided by the data provider. If there are discrepancies in the dataset description, please refer to the original data source or service provider's address.

II. Ownership Explanation:

1. All datasets on this site are copyrighted by their original publishers or data providers.

III. Data Reposting Explanation:

1. If you need to repost data from this site, please retain the original data source URL and related copyright notices.

IV. Infringement and Handling Explanation:

1. If any data on this site involves infringement, please contact us promptly, and we will arrange for the data to be taken offline.

Points：

9 Go earn points？

627
0
0
collect
Share

Select Language

AI Technology Community

Today search ranking

month_search_ranking

Dataset Category

Open Dataset

MSParS (V1.0)：知識ベースの質疑応答に用いる多視点意味解析データセット

Data Structure ? 4.94M

Data Structure ?

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

形式と例

引用

NLPCC 2019

Similar Data

The dataset is currently being organized and other channels have been prepared for you. Please use them

The dataset is currently being organized and other channels have been prepared for you. Please use them

ALL

I. Data Source and Display Explanation:

II. Ownership Explanation:

III. Data Reposting Explanation:

IV. Infringement and Handling Explanation: