Erick Rocha Fonseca

MT Quality Estimation

I was one of the organizers of the Machine Translation Quality Estimation shared task happening in WMT 2019.

NLPnet

My best known project is nlpnet. It is a Python library (together with standalone scripts) for training and running NLP tagging tools based on neural networks and distributed word representations, also known as word embeddings. Documentation can be found here, along with pre-trained models.

Currently, my version performs POS tagging and SRL, and I plan to add NER identification (at least one fork implemented it) and parsing in the coming months.

ASSIN

In 2016, I released ASSIN, a corpus for Recognizing Textual Entailment (or Natural Language Inference) and Semantic Textual Similarity in Portuguese. A good number of colleagues and professores helped in many ways ASSIN to get ready. It was used in a shared task parallel to the PROPOR 2016 conference.

Mac-Morpho

In 2013 and 2014, I revised the Mac-Morpho corpus, a collection of newswire texts in Brazilian Portuguese manually annotated with POS tags. Duplicated sentences and sentences with missing words were removed, and a few other corrections were performed. Word contractions were joined in a single token, as they appear in actual text, instead of leaving them separated. Both the original and the revised versions can be found in the link.