Models

Word2VecHelper

A wrapper around Gensim Word2Vec

class datawords.models.Word2VecHelper(parser_conf: ParserConf, phrases_model=None, size: int = 100, window: int = 5, min_count: int = 1, workers: int = 1, epoch: int = 5, model: Word2Vec | None = None, using_kv=False, loaded_from=None, stopw: StopWords | None = None)
__init__(parser_conf: ParserConf, phrases_model=None, size: int = 100, window: int = 5, min_count: int = 1, workers: int = 1, epoch: int = 5, model: Word2Vec | None = None, using_kv=False, loaded_from=None, stopw: StopWords | None = None)

It’s a wrapper around the original implementation of Word2Vec from the Gensim library. It adds the option to store and track the training params of the model including the parser used to do so.

property vector_size: int
property wv: Word2Vec | KeyedVectors
fit(X: Iterable)

This will train the model. It needs an iterable.

Parameters:

X (Iterable) – An iterable which returns plain texts.

parse(sentence: str) List[str]

It will parse only one text. :param txt: str :return: a list of words :rtype: List[str]

encode(sentence: str) ndarray

gets a sentence in plain text and encode it as vector

vectorize(sentence: List[str]) ndarray

Get a vector from a list of words if a sentence has words that don’t match in the word2vec model, then it fills with zeros

export_conf() W2VecMeta
save(fp: str | PathLike)
classmethod load(fp: str | PathLike, keyed_vectors=False) Word2VecHelper
class datawords.models.W2VecMeta(name: str, lang: str, parser_conf: ParserConf, phrases_model_path: str | None = None, epoch: int = 5, size: int = 100, window: int = 5, min_count: int = 1, version: str = '0.7.3', path: str | None = None)
name: str
lang: str
parser_conf: ParserConf
phrases_model_path: str | None
epoch: int
size: int
window: int
min_count: int
version: str
path: str | None
__init__(name: str, lang: str, parser_conf: ParserConf, phrases_model_path: str | None = None, epoch: int = 5, size: int = 100, window: int = 5, min_count: int = 1, version: str = '0.7.3', path: str | None = None) None

Method generated by attrs for class W2VecMeta.