Responsable : Patrice BELLOT

The "Data, Information & Content Management Group" (DIMAG) participates in several collaborative projects (ANR CONTINT, Equipment of excellency, Investissements d’Avenir Fonds pour la Société Numérique “Inter-Textes”, ...) and in international challenges (TREC, CLEF, INEX) in information retrieval.
Keywords : Information system, information retrieval, information extraction, text mining, natural language processing, data mining, web service, multi-agent modeling, ontology, BPM process modeling, serious games & simulation

DIMAG is composed of 13 faculty members (5 full professors, 8 associate professors) and 19 contract researchers (17 PhD students, one post-doctorate). Its research activities are dedicated to developing models and algorithms at the heart of wide information systems (IS for BI, Web, Digital Libraries and Data Warehouses).

Our activities are related to the design and the adaptation of information systems (architecture, services, customization) and to the processing of large collection of documents, Web content (information retrieval, information extraction, document classification) and data (mining , integration).

The aim of our work is :

  • to develop models and algorithms for information retrieval, information extraction and data mining, applicable to large collection of documents (digital libraries), web pages and big data,
  • to propose architectures for information systems (distributed models, multi-agent simulation, guided by the process models), process modeling ( BPM ) and approaches for integrating Web services,

Artificial intelligence is at the heart of our activities: knowledge engineering, information retrieval, natural language processing, intelligent agents, big data, machine learning.

  • Topic A - Designing Decisional and Adaptive Information Systems. Our aim is to define methods, architectures and approaches for designing and implementing information systems integrating usages and users and some criteria such flexibility, reliability or openness. We follow process-oriented and/or multi-agent oriented approaches.
  • Topic B - Information Retrieval and Information Extraction. A major scientific and societal challenge lies in the development of robust computational approaches dealing with the variable quality and ever increasing amount of information available on networks and on the Internet. Our goal is to develop new information retrieval methods that can be applied to large collections of textual documents (web pages, blogs, articles, books). These methods combine Natural Language Processing and Machine Learning (statistical learning or inductive logic programming). A strong focus is given to the evaluation of our proposals on actual data (international evaluation campaigns and conferences such as TREC, CLEF, INEX) and to their integration into real operational systems (eg. Equipment of excellency Indeed, our work is the subject of large-scale implementations evaluated in the context of collaborative projects (areas of Digital Libraries, e-commerce , health, Web searching and mining). Recently, we proposed approaches for:
    • filtering large stream of Web content for new content about named entities,
    • identifying and annotating citations in scholarly papers,
    • classifying texts by employing general and domain-oriented semantic ressources (medical and social sciences domains) or by selecting appropriate features for detecting reviews and opinionated contents (supervised and unsupervised approaches),
    • modeling query-oriented topic models for improving information retrieval,
    • integrating several sources of knowledge for expanding queries and improving information retrieval,
    • information extraction by employing inductive logic programming allowing to induce symbolic predicates.
  • Topic C - Data Mining and Data Integration. The central issue of this research topic is to develop algorithms and methods for processing data from multiple and heterogeneous sources. Specifically, our work is focused on the foundations and the applications of data searching and data integration. Originally developed within a "database" framework, the methods, algorithms and architectures for data integration must be redesigned to take into account their actual nature while database functions (components) are often performed by services. The data are of various nature, increasingly large, of variable quality and available in a distributed context. We propose (semantic) approaches for discovering Web services (SWS) or learning objects (e-Learning). These approaches deal with the approximation, the emergence and the ignorance in data mining and machine learning: pattern approximation, Boolean functions and search spaces.

DIMAG is involved in several collaborative projects :
- 2012-2020 : Equipment of Excellency (EQUIPEX) (DILOH - Digital Library for Open Humanities) through the creation of the OpenEdition Lab (text mining, book searching, information extraction, citation analysis) ;
- 2012-2016 : Inter-textes (automatic text linking, recommendation, citation network analysis, multi-label and faceted classification of texts) ;
- 2012-2015 : Agoraweb (sentiment analysis, opinion mining, natural language processing, reading recommendation)
- 2013-2015 : EU Cost Action "Keystone" (keyword search, keyword interpretation, databases) - EU RTD Framework Programme
- 2010-2014: CAAS (Contextual Analysis and Adaptive Search) - French National Agency ANR

Past projects :
- 2011-2013 : Project BILBO (Google Digital Humanities Awards)