ジョークを集めたデータセット – Collection of over 200,000 short jokes for humour research

LINK

データ分析コンペで有名なkaggleのデータセットとして公開されています。
約23万個ものジョークが、[“id”,”phrase”]のcsvファイルにまとめられていて、見てるだけでも結構楽しめます（笑）。
フレーズを使う場面の情報もセットで公開されていると、もっと応用できそうですが、kaggleのプロジェクトページで分析にかけたnotebookを公開している人もいます。

kaggle（2017.02月公開）

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.
This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

人工知能と表現の今

ジョークを集めたデータセット – Collection of over 200,000 short jokes for humour research

LINK

関連

TAG

SHARE US

SEARCH

キーワード検索

タグ検索