Georgios Spithourakis

PhD candidate in Natural Language Processing



I am a PhD candidate in Computer Science at the Machine Reading Group of University College London (UCL), supervised by Sebastian Riedel. I am investigating how to improve language models by including numerical attributes as input features and by explicitly modelling the output of numerical tokens. I am interested in natural language processing, machine learning, neural networks, semantics, and structured prediction. My research is supported by the Farr Institute of Health Informatics Research.


I hold a MSc in Computational Statistics and Machine Learning from UCL and a diploma in Electrical and Computer Engineering (equivalent to MSc Telecommunications Engineering) from the National Technical University of Athens (NTUA).


Since 2013 I have been working as a Teaching Assistant at the Department of Computer Science at UCL. In the summer of 2016 I interned as an applied scientist at Amazon Cambridge, and in the summer of 2015 at Microsoft Research. From 2009 to 2012 I worked as Research Assistant and Software Engineer at the Forecasting and Strategy Unit at NTUA.

Projects & News

Numbers in Language Modelling

Text often contains numbers to convey specific information in various domains, e.g. from everyday life ("John is 1.75 meters tall") to scientific and clinical documents ("severe dilation of the left ventricle with EDV=355ml"). There is a relation between the words and numbers we use, as seen in the figure above. In this example, we have extracted from clinical reports pairs of numbers (clinical measurements) and words (descriptions of severity of a clinical condition: "non", "mild", "severe") and estimated the distribution of words given numbers and that of numbers given words.
In language modelling, most numbers are often treated as out-of-vocabulary words (or masked under an "UNKNOWN NUMBER" category) and, thus, their informational content is lost. The goal of this project is to investigate and evaluate extensions to language models that allow them to incorporate numeric information. We find that extending the input of language models to include the magnitude of numeric tokens can lead to improvements in perplexity and the downstream tasks of semantic error correction (Spithourakis et al., 2016a) and text prediction (Spithourakis et al., 2016b).

Human Code
Computer Tongue

Together with poet Zena Edwards we have sought to explore the spectrum between artificial and human creativity. We have already organised two masterclasses, where participants have created poems through traditional poetry writing exercises (e.g. freeflow, ekphrasis) and through an AI-inspired interactive simulation, where participants pretended to be neurons in a poetry-generating artificial neural network. The events have also included invited talks and performances by improv theatre human/AI duet Piotr Mirowski and A.L.E.X., musician Xana, tech poet Dan Simpson, and academic and language expert Mandana Seyfeddinipur.
This project has been supported by Apples and Snakes (a big thanks to Daniela Paolucci!) and UCL's public engagement "Train and Engage" programme. More information can be found on Zena's Tumblr, blog, and website.


  • B. Riedel, I. Augenstein, G. Spithourakis, S. Riedel. A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. arXiv 2017. [paper]
  • N. Mostafazadeh, C. Brockett, B. Dolan, M. Galley, J. Gao, G. Spithourakis, L. Vanderwende. Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation. IJCNLP 2017, [paper]
  • G. Spithourakis, I. Augenstein, S. Riedel. Numerically Grounded Language Models for Semantic Error Correction. EMNLP, 2016. [paper]
  • G. Spithourakis, S. E. Petersen, S. Riedel. Clinical Text Prediction with Numerically Grounded Conditional Language Models. EMNLP workshop, 2016. [paper]
  • J. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, B. Dolan. A Persona-Based Neural Conversation Model. ACL, 2016. [paper]
  • G. Spithourakis, F. Petropoulos, K. Nikolopoulos and V. Assimakopoulos. Amplifying the learning effect via a forecasting and foresight support system. International Journal of Forecasting, 31(1):20-32, 2015. [paper]
  • G. Spithourakis, S. Petersen, and S. Riedel. Harnessing the predictive power of clinical narrative to resolve inconsistencies and omissions in EHRs. Short paper and poster presentation in 2nd Workshop on Machine Learning for Clinical Data Analysis, Healthcare and Genomics, NIPS 2014.
  • G. Spithourakis, F. Petropoulos, K. Nikolopoulos and V. Assimakopoulos. A systemic view of the ADIDA framework. IMA Journal of Management Mathematics, 25(2): 125-137, 2014. [paper]
  • F. Petropoulos, K. Nikolopoulos, G. Spithourakis and V. Assimakopoulos. Empirical heuristics for improving intermittent demand forecasting. Industrial Management & Data Systems, 113(5):683-696, 2013. [paper]
  • G. Spithourakis, F. Petropoulos, M.Z. Babai, K. Nikolopoulos and V. Assimakopoulos. Improving the performance of popular supply chain forecasting techniques: an empirical investigation. Supply Chain Forum: an International Journal, 12(4):16-25, 2012.


Email Address


Visiting Address

1st Floor
90 High Holborn
London, WC1V 6LJ
United Kingdom

Postal Address

University College London
Dept. of Computer Science (1ES)
Gower Street
London WC1E 6BT
United Kingdom

Find me on ...