PhD in Natural Language Processing
PhD in Natural Language Processing
• Interested in natural language processing, computational linguistics, machine learning,
deep learning, neural networks, numerical reasoning, conversation modelling, and forecasting.
• PhD in Computer Science at the Machine Reading Group of University College London (UCL), supervised by Sebastian Riedel.
• Investigating how to improve language models by including numerical attributes as input features and by explicitly modelling the output of numerical tokens.
• Funded by the Farr Institute of Health Informatics Research.
• PhD in Natural Language Processing, UCL.
Thesis: “Numeracy of Language Models: Joint Modelling of Words and Numbers”.
• MSc in Computational Statistics and Machine Learning, UCL.
• BSc/MSc diploma in Electrical and Computer Engineering, National Technical University of Athens (NTUA).
• Machine Learning Engineer at PolyAI (since 2018).
• Teaching Assistant for NLP and web/mobile app development modules at the Department of Computer Science at UCL (2013-2017).
• Research Scientist Intern at Amazon Cambridge (8 months, 2016) and Microsoft Research (3 months, 2015), where I worked on question answering for dialogue systems and grounded conversation modelling, respectively.
• Research Associate/ Research Assistant/ Software Engineer at the Forecasting and Strategy Unit at NTUA (2009-2012).
Text often contains numbers to convey specific information in various domains,
e.g. from everyday life ("John is 1.75 meters tall")
to scientific and clinical documents ("severe dilation of the left ventricle with EDV=355ml").
There is a relation between the words and numbers we use, as seen in the figure above.
In this example, we have extracted from clinical reports pairs of numbers (clinical measurements) and words (descriptions of severity of a clinical condition: "non", "mild", "severe") and estimated the distribution of words given numbers and that of numbers given words.
In language modelling, most numbers are often treated as out-of-vocabulary words (or masked under an "UNKNOWN NUMBER" category) and, thus, their informational content is lost. The goal of this project is to investigate and evaluate extensions to language models that allow them to incorporate numeric information. We find that extending the input of language models to include the magnitude of numeric tokens can lead to improvements in perplexity and the downstream tasks of semantic error correction (Spithourakis et al., 2016a) and text prediction (Spithourakis et al., 2016b).
Together with poet Zena Edwards
we have sought to explore the spectrum between artificial and human creativity.
We have already organised
where participants have created poems through traditional poetry writing exercises (e.g. freeflow, ekphrasis)
and through an AI-inspired interactive simulation,
where participants pretended to be neurons in a poetry-generating artificial neural network.
The events have also included invited talks and performances by
improv theatre human/AI duet Piotr Mirowski and A.L.E.X.,
tech poet Dan Simpson, and
academic and language expert Mandana Seyfeddinipur.
This project has been supported by Apples and Snakes (a big thanks to Daniela Paolucci!) and UCL's public engagement "Train and Engage" programme. More information can be found on Zena's Tumblr, blog, and website.
90 High Holborn
London, WC1V 6LJ
University College London
Dept. of Computer Science (1ES)
London WC1E 6BT