Open Dataset

人工知能コーパス - Ubuntu対話コーパスには2600万回の対話データが含まれています

2.7G

3217 hits

5 likes

44 downloads

0 discuss

NLP Text

対話システム（人間が仮想エージェントと自然な対話を行うことができる）を構築することは、自然言語処理における困難なタスクであり、多くの進行中の研究の焦点となっています。いくつかの課題には......

Introduction
Data file
Related papers
Code
Discuss(0)
Instructions

Data Structure ? 2.7G

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

対話システム（人間が仮想エージェントと自然な対話を行える）を構築することは、自然言語処理における困難なタスクであり、多くの進行中の研究の重点となっています。いくつかの課題としては、時間の経過とともに同じエンティティへの参照をリンクすること、以前の対話で起こったことを追跡すること、適切な応答を生成することなどがあります。このような自然に発生する対話のコーパスは、対話システムの構築と評価に役立ちます。

内容概要：

新しいUbuntu対話コーパスには、Ubuntuチャットログから抽出された約100万の2人の対話が含まれており、Ubuntu関連のさまざまな問題に対する技術サポートを得るために使用されます。各対話は平均で8ターン、少なくとも3ターンです。すべての対話はテキスト形式（音声ではなく）で行われます。

完全なデータセットには93万の対話と1億語以上の単語が含まれています。このデータセットには、.csvファイルに分散されたこのデータセットのサンプルが含まれています。このデータセットには、2600万ターンに分散された2億6900万語以上のテキストが含まれています。

フォルダ：対話が由来するフォルダ。各ファイルには、1つのフォルダ内の対話が含まれています。
dialogID：特定の対話のID番号。対話IDは各フォルダで再利用されます。
日付：この対話の送信時間のタイムスタンプ。
送信元：その行の対話を送信したユーザー。
宛先：彼らが返信しているユーザー。対話の最初のターンでは、このフィールドは空白です。
テキスト：そのターンの対話のテキストで、二重引用符（“）で区切られています。改行文字（\ n）は削除されています。

引用：

このデータセットは、Ryan Lowe、Nissan Pow、Iulian V. Serban†、Joelle Pineauによって収集されました。Apacheライセンス2.0の下でここで利用できます。あなたが仕事でこのデータを使用する場合は、以下の引用を提供してください：

Ryan Lowe、Nissan Pow、Iulian V. Serban、Joelle Pineau、「Ubuntu対話コーパス：非構造化多対話システム用の大規模データセット」、SIGDial2015。

No content available at the moment

Share your thoughts

Go share your ideas~~

ALL

Welcome to exchange and share

Your sharing can help others better utilize data.

Data usage instructions:

I. Data Source and Display Explanation:

1. The data originates from internet data collection or provided by service providers, and this platform offers users the ability to view and browse datasets.

2. This platform serves only as a basic information display for datasets, including but not limited to image, text, video, and audio file types.

3. Basic dataset information comes from the original data source or the information provided by the data provider. If there are discrepancies in the dataset description, please refer to the original data source or service provider's address.

II. Ownership Explanation:

1. All datasets on this site are copyrighted by their original publishers or data providers.

III. Data Reposting Explanation:

1. If you need to repost data from this site, please retain the original data source URL and related copyright notices.

IV. Infringement and Handling Explanation:

1. If any data on this site involves infringement, please contact us promptly, and we will arrange for the data to be taken offline.

Points：

13 Go earn points？

3217
44
5
collect
Share

Select Language

AI Technology Community

Today search ranking

month_search_ranking

Dataset Category

Open Dataset

人工知能コーパス - Ubuntu対話コーパスには2600万回の対話データが含まれています

Data Structure ? 2.7G

Data Structure ?

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

内容概要：

Similar Data

ALL

I. Data Source and Display Explanation:

II. Ownership Explanation:

III. Data Reposting Explanation:

IV. Infringement and Handling Explanation: