Welcome to datawords!

This is a library oriented to common and uncommon NLP tasks.

Datawords emerges after two years of solving different projects that required NLP techniques like training and saving Word2Vec (Gensim) models, finding entities on text (Spacy ), ranking texts (scikit-network), indexing it (Spotify Annoy), translating it (Hugging Face).

Then to use that libraries some pre-processing, post-processing tasks and transformations were also required. For this reason, datawords exists. Sometimes it’s very opinated (Indexing is over text, and not over vectors like Annoy allows.), sometimes gives you freedom and abstract classes to expand the functionality.

Another way to see this library, it’s as an agreggator of all that excellent libraries mentioned before.

In a nutshell, Datawords let’s you:

  • Train Word2Vec models (Gensim)

  • Build Indexes for texts (Annoy, SQLite)

  • Translate texts (Transformers)

  • Rank texts (PageRank)

Indices and tables