Open Dataset

グーグルAI自然言語対話データセットCCPE

645 hits

0 likes

10 downloads

0 discuss

MNIST Classification

Introduction
Data file
Related papers
Code
Discuss(0)
Instructions

Data Structure ? 5M

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

CCPE の正式名称は Coached Conversational Preference Elicitation で、これは我々が提案した対話の中でユーザーの嗜好を獲得する新しい方法であり、つまり自然でありながら構造化された会話嗜好の収集を可能にします。ある分野の対話を研究することで、我々は人々が映画の嗜好をどのように説明するかについて簡単な定量分析を行いました。また、CCPE - M データセットをコミュニティに公開しました。このデータセットには 500 を超える映画嗜好対話が含まれており、10,000 以上の嗜好が表現されています。

具体的には、このデータセットは 502 個のダイアログで構成されており、ユーザーとアシスタントの間で自然言語で映画の嗜好を議論する際に、12,000 個の注釈付きの発話があります。これは 2 人の有料のクラウドワーカー間の対話によって収集されました。一方のワーカーが「アシスタント」の役割を果たし、もう一方のワーカーが「ユーザー」の役割を果たします。「アシスタント」は CCPE 方法に従って、映画に関する「ユーザー」の嗜好を引き出します。

アシスタントが提起する質問は、「ユーザー」が自分の嗜好をできるだけ多く伝えるために使用する用語の偏りを最小限に抑え、自然言語でそれらの嗜好を獲得することを目的としています。各ダイアログには、エンティティの言及、エンティティに関する表現の嗜好、提供されたエンティティの説明、およびエンティティに関するその他の文が注釈付けされています。

嗜好誘発

映画向けの CCPE データセットでは、ユーザーを装った個人がマイクに向かって話し、その音声が直接デジタルアシスタントを装った人に再生されます。「アシスタント」は彼らの応答を出力し、それをテキスト読み上げ機能でユーザーに再生します。

これらの 2 人の自然な対話には、合成対話では再現が難しい、双方の間で自然に発生する言い淀みやミスが含まれています。これにより、人々の映画嗜好に関する自然で体系的な対話のシリーズが作成されます。

このデータセットを観察すると、我々は人々が自分の嗜好を説明する方法が非常に豊富であることを発見しました。このデータセットは、この豊富さを大規模に表現した最初のデータセットです。また、嗜好（選択肢の特徴とも呼ばれる）は、必ずしもスマートアシスタントの方法や、レコメンドサイトの方法と一致しないこともわかりました。言い換えると、あなたが最も好きな映画サイトやサービスのフィルターは、あなたが個人からのレコメンドを求める際に様々な映画を説明するのに使う言葉と一致しない可能性があります。

CCPE データセットの詳細については、我々の研究論文（https://ai.google/research/pubs/pub48414）を参照してください。この論文は 2019 年の談話と対話特別興味グループ（https://www.aclweb.org/portal/content/sigdial - 2019 - annual - meeting - special - interest - group - discourse - and - dialogue - call - special）年次会議で公開されます。

No content available at the moment

Share your thoughts

Go share your ideas~~

ALL

Welcome to exchange and share

Your sharing can help others better utilize data.

Data usage instructions:

I. Data Source and Display Explanation:

1. The data originates from internet data collection or provided by service providers, and this platform offers users the ability to view and browse datasets.

2. This platform serves only as a basic information display for datasets, including but not limited to image, text, video, and audio file types.

3. Basic dataset information comes from the original data source or the information provided by the data provider. If there are discrepancies in the dataset description, please refer to the original data source or service provider's address.

II. Ownership Explanation:

1. All datasets on this site are copyrighted by their original publishers or data providers.

III. Data Reposting Explanation:

1. If you need to repost data from this site, please retain the original data source URL and related copyright notices.

IV. Infringement and Handling Explanation:

1. If any data on this site involves infringement, please contact us promptly, and we will arrange for the data to be taken offline.

Points：

0 Go earn points？

645
10
0
collect
Share

Select Language

AI Technology Community

Today search ranking

month_search_ranking

Dataset Category

Open Dataset

グーグルAI自然言語対話データセットCCPE

Data Structure ? 5M

Data Structure ?

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

Similar Data

The dataset is currently being organized and other channels have been prepared for you. Please use them

The dataset is currently being organized and other channels have been prepared for you. Please use them

ALL

I. Data Source and Display Explanation:

II. Ownership Explanation:

III. Data Reposting Explanation:

IV. Infringement and Handling Explanation: