Portrait of Dr Di Cai Dr Di Cai

d.cai@hud.ac.uk | 01484 472340


After completing a first (Hons) degree in applied mathematics, Dr Di Cai was appointed lecturer/senior lecturer in the Department of Mathematics at Tianjin University of Science and Technology, P.R. China. She taught a wide range of subjects in applied mathematics, engineering mathematics and advanced engineering mathematics. She was a course leader for engineering mathematics and a member of the University/Department Teaching and Learning Committee. She was involved in the organization of national conferences and workshops on the teaching and learning of engineering mathematics.

Di was awarded her PhD in Information Retrieval (IR) in 2004, under the supervision of Professor Keith van Rijsbergen, leader of a world-leading IR research group in the Department of Computing Science (DCS) at the University of Glasgow (UG). Since then Di has worked as a research fellow on several projects in areas of IR, Text Mining, Document Classification and Sentiment Analysis. She commenced work on an EPSRC funded project: “XML technologies for the acceleration of cancer drug target discovery” in DCS in 2004. Soon after, she applied successfully for funding from Microsoft Research, Cambridge, for her own research project: “A discrimination information model for automatic query reformulation” in the same Department. In late 2006, she worked on a BBSRC funded project: “A taxonomically intelligent phylogenetic database” in the Institute of Biological & Life Sciences at UG. She moved to the School of Computing and IT at the University of Wolverhampton in 2009, working on a large EU funded project: “Collective emotions in cyberspace”. Di joined the University of Huddersfield in 2011.

Di is currently a member of IEEE (The Institute of Electrical and Electronics Engineers), ACM (The Association for Computing Machinery) and BCS-IRSG (The BCS Information Retrieval Specialist Group).

Research and Scholarship

Dr Di Cai’s range of research interests centre around any type of information processing that can be represented mathematically, including formal modelling, quantitative method development, problem solving, algorithm design and large-scale data analysis. 

Di’s current research focuses on fundamental issues relevant to many areas of science, including statistical semantic analysis of features (concepts, terms, phrases, words, etc.), representation of objects (documents, abstracts, sentences, queries, etc.), detection of unreliable samples (obtained from web users), based on a variety of theories (probability theory, information theory, theory of evidence, rough set theory, as well as machine learning methods). Some specific topics are: 

  • measurement of discrimination information of features
  • measurement of semantic relatedness/association between features
  • identification of informative terms and sentiment-bearing terms
  • extraction of key terms and taxonomic names
  • thesaurus simplification and normalization
  • key term modelling and term classification
  • representation and modelling of objects
  • measurement of similarity between objects, between features, between objects and features
  • query formulation and reformulation (automatic/semi-automatic/interactive)
  • algorithm development for system design and implementation
  • data analysis and corpus processing
  • detection of outlying ratings and identification of unreliable samples (when data is gathered from social websites)

Publications and Other Research Outputs


Cai, D. and McCluskey, T. (2014) ‘A General Framework of Generating Estimation Functions for Computing the Mutual Information of TermsInternational Journal of Advanced Computer Science and Applications , 4 (11), pp. 198-208. ISSN 2158-107X


Cai, D. and Wade, S. (2012) ‘A Rule-Based Method for Outlying Rating DetectionInternational Journal of Computer and Communication Engineering , 1 (4), pp. 466-471. ISSN 2010-3743

Cai, D. and McCluskey, T. (2012) ‘A Simple Method for Estimating Term Mutual InformationJournal of Computing , 4 (6), pp. 1-6. ISSN 2151-9617

Kowalska, K., Cai, D. and Wade, S. (2012) ‘Sentiment Analysis of Polish TextsInternational Journal of Computer and Communication Engineering , 1 (1), pp. 39-42. ISSN 2010-3743


Thelwall, M., Buckley, K., Paltoglou, G., Cai, D. and Kappas, A. (2010) ‘Sentiment Strength Detection in Short Informal TextJournal of the American Society for Information Science and Technology , 61 (12), pp. 2544-2558. ISSN 0002-8231

Cai, D (2010) ‘An Information-Theoretic Foundation for the Measurement of Discrimination InformationIEEE Transactions on Knowledge and Data Engineering , 22 (9), pp. 1262-1273. ISSN 1041-4347


Cai, D (2009) ‘Determining Semantic Relatedness through the Measurement of Discrimination Information Using Jensen DifferenceInternational Journal of Intelligent Systems , 24 (5), pp. 477-503. ISSN 0884-8173

Cai, D. and van Rijsbergen, C. (2009) ‘Learning semantic relatedness from term discrimination informationExpert Systems With Applications , 36 (2), pp. 1860-1875. ISSN 0957-4174


Cai, D. and van Rijsbergen, C. (2008) ‘An Algorithm for Modelling Key TermsInternational Journal of Intelligent Systems , 23 (1), pp. 50-81. ISSN 0884-8173


Cai, D. and van Rijsbergen, C. (2005) ‘Semantic Relations and Information Discovery’. In: Intelligent Data Mining: Techniques and Applications. London, UK: Springer. pp. 79-102. ISBN 9783540262565


Cai, D (2001) ‘Extension and applications of evidence theory’. In: Soft Computing for Risk Evaluation Management: Applications in Technology, Environment & Finance. : Springer. pp. 73-93. ISBN 9783790814064

Cai, D (2001) ‘Data mining based on evidence theory’. In: Soft Computing for Risk Evaluation Management: Applications in Technology, Environment & Finance. : Springer. pp. 97-120. ISBN 9783790814064

Research Degree Supervision

Dr Di Cai’s research interests are in Data Mining and Artificial Intelligence in general and specific to the following areas:

  • Information Retrieval and Extraction
  • Text Mining and Analytics
  • Document Classification and Summarization
  • Sentiment Analysis and Opinion Mining

Current opportunities

  • Please contact this member of staff to discuss possible opportunities.