Open Dataset

NLP用のテキストデータセット

6.49G

736 hits

0 likes

0 downloads

0 discuss

NLP,Earth and Nature,Education Classification

これは自然言語処理（NLP）研究に使用するための3つのテキストデータセットのバンドルです。対話システム技術チャレンジ7（DSTC7）ウブントゥA......

Introduction
Data file
Related papers
Code
Discuss(0)
Instructions

Data Structure ? 6.49G

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

これは、自然言語処理（NLP）研究に使用するための3つのテキストデータセットの束です。

対話システム技術チャレンジ7（DSTC7）

Ubuntu
Advising

ウィキテキスト - 103

このデータを使用したトランスフォーマーネットワークの実装は、こちらで見つけることができます。

内容

このデータセットには、3つのデータセットの前処理済みバージョンと生データバージョンが含まれています。
data.7zには、3つの異なるNLPタスク用の3つのフォルダがあります：

DSTCデータセットの分類用のCL
DSTCデータセットで言語モデルを構築するためのLM - DSTC
また、ウィキテキスト - 103データセットで言語モデルを構築するためのLM - WIKI103。
.npyファイルはNumPyのnp.load()関数を使用して読み込むことができ、.pklファイルはPythonのpickleモジュールを使用して読み込むことができます。

テスト、トレイン、検証ファイルには、UbuntuとAdvisingのデータセット、およびウィキテキスト - 103の生データであるwikitext - 103 - rawが含まれています。

謝辞

これらのデータセットは私が作成したものではなく、前処理に使用される方法のいずれも開発したわけではありません。私は単にそれらをここに示す形式に整理し、前処理を行っただけです。

No content available at the moment

Share your thoughts

Go share your ideas~~

ALL

Welcome to exchange and share

Your sharing can help others better utilize data.

Data usage instructions:

I. Data Source and Display Explanation:

1. The data originates from internet data collection or provided by service providers, and this platform offers users the ability to view and browse datasets.

2. This platform serves only as a basic information display for datasets, including but not limited to image, text, video, and audio file types.

3. Basic dataset information comes from the original data source or the information provided by the data provider. If there are discrepancies in the dataset description, please refer to the original data source or service provider's address.

II. Ownership Explanation:

1. All datasets on this site are copyrighted by their original publishers or data providers.

III. Data Reposting Explanation:

1. If you need to repost data from this site, please retain the original data source URL and related copyright notices.

IV. Infringement and Handling Explanation:

1. If any data on this site involves infringement, please contact us promptly, and we will arrange for the data to be taken offline.

Points：

25 Go earn points？

736
0
0
collect
Share

Select Language

AI Technology Community

Today search ranking

month_search_ranking

Dataset Category

Open Dataset

NLP用のテキストデータセット

Data Structure ? 6.49G

Data Structure ?

*The above analysis is the result extracted and analyzed by the system, and the specific actual data shall prevail.

README.md

内容

謝辞

Similar Data

The dataset is currently being organized and other channels have been prepared for you. Please use them

The dataset is currently being organized and other channels have been prepared for you. Please use them

ALL

I. Data Source and Display Explanation:

II. Ownership Explanation:

III. Data Reposting Explanation:

IV. Infringement and Handling Explanation: