Kudos AI | Blog | Hugging Face: What you need to know!

Hugging Face: What you need to know!

When working on a Machine Learning problem, improving an existing solution and reusing it can help you find high performance solutions much faster.

Using existing models is not only beneficial for data scientists, but also for companies. It allows them to save on computational costs and time.

Today, there are several companies that provide open-source libraries containing pre-trained models and Hugging Face is one of them.

Hugging Face is a French startup that became known thanks to the NLP infrastructure they developed. Today, it is about to revolutionize the field of Machine Learning and automatic natural language processing.

In this article, we will present Hugging Face and detail the basic tasks that this library can perform. We will also list its advantages and alternatives.

- Hugging Face: What is it? -

Hugging Face is an open-source NLP library that provides an API to access several pre-trained models.

It facilitates learning and experimentation, as the models are already trained and ready to be used. It also offers tools to manage data and models, as well as to develop and train new models.

The startup was founded in 2015 as a company called ItNest, founded by Victor Sanh and Thomas Wolf, with a mission to make artificial intelligence accessible to everyone. It was renamed Hugging Face in 2016. In 2019, the company raised $15 million in Series A funding, with Menlo Ventures as its lead investor.

A pioneer in AI, it was named one of the world's most innovative companies in 2020 by MIT Technology Review.

Hugging Face has developed a range of AI-based products, including an open-source Deep learning library called Transformers. It also offers an online collaboration platform for users to manage, share and develop their AI models.

The Hugging Face library has many advantages. Here are some of them:

It offers a wide variety of pre-trained models for different types of NLP tasks, including text classification, named entity detection, text generation, etc.
It is easy to use and can be integrated with other frameworks and applications.
It offers good documentation and an active community.
The platform also offers tools to simplify the deployment of NLP models on servers and mobile devices.

- Hugging Face products -

In recent years, Hugging Face has launched several products, including:

+ Chatbots

One of the main products are chatbot applications that allow users to interact with artificial intelligence developed by the company. To that end, Hugging Face developed its own natural language processing (NLP) model called Hierarchical Multi-Task Learning (HMTL) and maintained a library of pre-trained NLP models under PyTorch Transformers, available only on iOS.

These applications are Chatty, Talking Dog, Talking Egg, and Boloss. This AI is intended to be a digital companion that can entertain users.

+ Python libraries

The startup has also developed a set of tools for its developer community to manage, share and develop their own machine learning models:

Transformers: an open-source library for training and distributing NLP models based on Python, which provides an API for using many popular transformer architectures such as BERT, RoBERTa, GPT-2 or DistilBERT.

These achieve top results on a variety of NLP tasks such as text classification, information extraction, question answering, and text generation. These architectures are pre-trained with different weights.
Datasets: an open-source library for accessing over 100 NLP datasets.
Tokenizers: an open-source library for tokenizing over 40 languages.
Accelerate: a simple API for running mixed-precision scripts in any kind of distributed configuration (multi-GPU, TPU, etc.), allowing you to program your own training cycles. The same code can easily be run on a local machine for debugging or in a training environment.

Accelerate also provides a CLI tool to quickly set up and test training environments and run scripts.

+ BLOOM

BLOOM is an open-source, full-text trained, large language model (LLM).

It is capable of producing coherent text in 46 languages, including Spanish, French, and Arabic.

BLOOM can also be trained to perform text tasks for which it has not been explicitly trained by projecting them as text generation tasks.

BLOOM will be the first language model with more than 100 billion parameters ever created.

+ NLP Training

In addition to documentation, Hugging Face offers NLP training using the Hugging Face ecosystem libraries, such as Transformers, Datasets, Tokenizers, and Accelerate, as well as the Hugging Face Hub.

The training is completely free and ad-free.

- Limitations and alternatives of Hugging Face -

Although Hugging Face has good models and rich functionality, it has some important shortcomings:

It does not support the processing of all types of textual data, including unstructured text.
It does not provide inline prediction capabilities, models must be imported locally to use them, which limits its use for production purposes.
It is also a bit difficult to find pre-trained models. This is due to the fact that models are trained by different people and there is no centralized repository. Also, it is extremely difficult to compare the performance of different models, as there is no standard benchmarking.

There are also some limitations to using Hugging Face Transformers. The main one is that it is an open-source project, which means that there is no official support.

In addition, the documentation is sometimes unclear. This can be a problem for some users, especially those who are not comfortable with the code.

Moreover, Transformers is quite new and there are still some missing features.

For example, there is no support for recurrent neural network (RNN) architectures. This means that if you need an RNN architecture for your NLP project, you will probably have to use another library.

Finally, there are a few alternatives to Hugging Face Transformers that might be better suited for you. TensorFlow is an open-source library for machine learning that also supports RNNs.

Other alternatives to Hugging Face include Google's AutoML, IBM Watson, and Microsoft Azure.

The Hugging Face platform offers everyone the ability, through their open-source repositories, to get started with NLP problems. They also have several comprehensive tutorials on their website to guide their community members in using their library.

So if you're a developer looking to get started with NLP, Hugging Face is a great option, as it simplifies the learning and experimentation process.