Erick Rocha Fonseca

I am a Data Scientist located in Cologne, Germany.

I have been a post-doc researcher at Instituto de Telecomunicações.

I received my Master's degree and PhD from the University of São Paulo, at the Institute for Mathematics and Computer Science, being a member of NILC. During my PhD, I was an intern in the NLP group of Fondazione Bruno Kessler.

My research and work is centered in Natural Language Processing, a field that I greatly enjoy, and modern machine learning techniques.

I try to write some posts on NLP for a wider audience in my Medium blog.

Email: erickrfonseca123@abc.gmail.com
Google Scholar Profile
Currículo Lattes (in Portuguese)
MT Quality Estimation

I was one of the organizers of the Machine Translation Quality Estimation shared task happening in WMT 2019.

NLPnet

My best known project is nlpnet. It is a Python library (together with standalone scripts) for training and running NLP tagging tools based on neural networks and distributed word representations, also known as word embeddings. Documentation can be found here, along with pre-trained models.

Currently, my version performs POS tagging and SRL, and I plan to add NER identification (at least one fork implemented it) and parsing in the coming months.

ASSIN

In 2016, I released ASSIN, a corpus for Recognizing Textual Entailment (or Natural Language Inference) and Semantic Textual Similarity in Portuguese. A good number of colleagues and professores helped in many ways ASSIN to get ready. It was used in a shared task parallel to the PROPOR 2016 conference.

Mac-Morpho

In 2013 and 2014, I revised the Mac-Morpho corpus, a collection of newswire texts in Brazilian Portuguese manually annotated with POS tags. Duplicated sentences and sentences with missing words were removed, and a few other corrections were performed. Word contractions were joined in a single token, as they appear in actual text, instead of leaving them separated. Both the original and the revised versions can be found in the link.