Dataset¶
-
class
Manteia.Dataset.
Dataset
(name='20newsgroups', train=True, test=False, dev=False, classe=True, desc=False, path='./dataset', verbose=True)¶ This is the class description in order to get some dataset.
name - name of the dataset (str)
train - load the dataset train Default: ‘True’.
test - load the dataset test Default: ‘False’.
dev - load the dataset dev Default: ‘False’.
description - load description Default: ‘False’.
verbose - produce and display some explanation
path - Path to the data file.
-
del_dir
(name)¶ Delete file of the dataset.
-
load_20newsgroups
()¶ - Defines 20newsgroups datasets.
The labels includes:
0 : sci.crypt.
1 : sci.electronics.
2 : sci.med.
3 : sci.space.
4 : rec.autos.
5 : rec.sport.baseball.
6 : rec.sport.hockey.
7 : talk.politics.guns.
8 : talk.politics.mideast.
9 : talk.politics.misc.
10 : talk.religion.misc.
from Manteia.Dataset import Dataset ds=Dataset('20newsgroups') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_Amazon_Review_Full
()¶ - Defines Amazon Review Full Star Dataset.
The labels includes:
1 - 5 : rating classes (5 is highly recommended).
from Manteia.Dataset import Dataset ds=Dataset('Amazon Review Full',test=True,desc=True) print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5]) print('Test : ') print(ds.documents_test[:5]) print(ds.labels_test[:5]) print('Description :') print(ds.description)
-
load_Amazon_Review_Polarity
()¶ - Defines Amazon Review Polarity datasets.
The labels includes:
1 : Negative polarity.
2 : Positive polarity.
from Manteia.Dataset import Dataset ds=Dataset('Amazon Review Polarity',test=True,desc=True) print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5]) print(ds.documents_test[:5]) print(ds.labels_test[:5]) print(ds.description)
-
load_DBPedia
()¶ - Defines DBPedia datasets.
The labels includes:
Company
EducationalInstitution
Artist
Athlete
OfficeHolder
MeanOfTransportation
Building
NaturalPlace
Village
Animal
Plant
Album
Film
WrittenWork
from Manteia.Dataset import Dataset ds=Dataset('DBPedia',test=True,desc=True,classe=True) print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5]) print('Test : ') print(ds.documents_test[:5]) print(ds.labels_test[:5]) print('Description :') print(ds.description) print('List labels :') print(ds.list_labels)
-
load_SST_2
()¶ - Defines SST 2 datasets.
The labels includes:
Negative polarity.
Positive polarity.
from Manteia.Dataset import Dataset ds=Dataset('SST-2') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_SST_5
()¶ - Defines SST 5 datasets.
The labels includes:
very negative.
negative.
neutral.
positive.
very positive.
from Manteia.Dataset import Dataset ds=Dataset('SST-5',dev=True) print('Dev : ') print(ds.documents_dev[:5]) print(ds.labels_dev[:5])
-
load_Short_Jokes
()¶ Defines Short_Jokes dataset.
from Manteia.Dataset import Dataset ds=Dataset('pubmed_rct20k') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_Tweeter_Airline_Sentiment
()¶ - Defines Tweeter Airline Sentiment dataset.
The labels includes:
positive.
neutral.
negative.
from Manteia.Dataset import Dataset ds=Dataset('Tweeter Airline Sentiment') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_Yahoo_Answers
()¶ - Defines Yahoo! Answers datasets.
The labels includes:
Society & Culture
Science & Mathematics
Health
Education & Reference
Computers & Internet
Sports
Business & Finance
Entertainment & Music
Family & Relationships
Politics & Government
from Manteia.Dataset import Dataset ds=Dataset('Yahoo! Answers',test=True,desc=True) print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5]) print('Test : ') print(ds.documents_test[:5]) print(ds.labels_test[:5]) print('Description :') print(ds.description) print('List labels :') print(ds.list_labels)
-
load_Yelp_Review_Full
()¶ - Defines Yelp Review Full Star Dataset.
The labels includes:
1 - 5 : rating classes (5 is highly recommended).
from Manteia.Dataset import Dataset ds=Dataset('Yelp Review Full',test=True,desc=True) print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5]) print('Test : ') print(ds.documents_test[:5]) print(ds.labels_test[:5]) print('Description :') print(ds.description)
-
load_Yelp_Review_Polarity
()¶ - Defines Yelp Review Polarity datasets.
The labels includes:
1 : Negative polarity.
2 : Positive polarity.
from Manteia.Dataset import Dataset ds=Dataset('Yelp Review Polarity',test=True,desc=True) print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5]) print(ds.documents_test[:5]) print(ds.labels_test[:5]) print(ds.description)
-
load_agnews
()¶ - Defines Agnews datasets.
The labels includes:
0 : World
1 : Sports
2 : Business
3 : Sci/Tech
from Manteia.Dataset import Dataset ds=Dataset('agnews') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_drugscom
()¶ - Defines Drugs.com Dataset.
The labels includes:
0 - 9 : rating classes (9 is highly).
from Manteia.Dataset import Dataset ds=Dataset('drugscom') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_pubmed_rct20k
()¶ - Defines Pubmed RCT20k datasets.
The labels includes:
BACKGROUND.
CONCLUSIONS.
METHODS.
OBJECTIVE.
RESULTS.
from Manteia.Dataset import Dataset ds=Dataset('pubmed_rct20k') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
load_trec
()¶ - Defines Trec datasets.
The labels includes:
ABBREVIATION
ENTITY
DESCRIPTION
HUMAN
LOCATION
NUMERIC
from Manteia.Dataset import Dataset ds=Dataset('agnews') print('Train : ') print(ds.documents_train[:5]) print(ds.labels_train[:5])
-
Manteia.Dataset.
clear_folder
(dir)¶ Del directorie and is content.
-
Manteia.Dataset.
download_and_extract
(url, data_dir)¶ download_and_extract file of dataset.
See [1] for an introduction to stylish blah, blah…
- 1
Edward Nelson. Radically Elementary Probability Theory. Princeton University Press, 1987.