Erick Rocha Fonseca

I am a postdoc researcher in Natural Language Processing at Instituto de Telecomunicações. I received my Master's degree and PhD from the University of São Paulo, at the Institute for Mathematics and Computer Science. I was a member of NILC. Between April 2016 and April 2017, I was an intern in the NLP group of Fondazione Bruno Kessler.

My research is centered in Natural Language Processing, a field that I greatly enjoy. I currently work with structured learning (i.e., models that learn complex structures) under a deep learning framework.

I try to write some posts on NLP for a wider audience in my Medium blog.

Google Scholar Profile
Currículo Lattes (in Portuguese)
Quality Estimation

I was one of the organizers of the Machine Translation Quality Estimation shared task happening in WMT 2019.


My best known project is nlpnet. It is a Python library (together with standalone scripts) for training and running NLP tagging tools based on neural networks and distributed word representations, also known as word embeddings. Documentation can be found here, along with pre-trained models.

Currently, my version performs POS tagging and SRL, and I plan to add NER identification (at least one fork implemented it) and parsing in the coming months.


In 2016, I released ASSIN, a corpus for Recognizing Textual Entailment (or Natural Language Inference) and Semantic Textual Similarity in Portuguese. A good number of colleagues and professores helped in many ways ASSIN to get ready. It was used in a shared task parallel to the PROPOR 2016 conference.


In 2013 and 2014, I revised the Mac-Morpho corpus, a collection of newswire texts in Brazilian Portuguese manually annotated with POS tags. Duplicated sentences and sentences with missing words were removed, and a few other corrections were performed. Word contractions were joined in a single token, as they appear in actual text, instead of leaving them separated. Both the original and the revised versions can be found in the link.