The use case addressing science policy support is an integrative experiment to combine state-of-the art text mining, scientometrics (bibliometrics) and cybermetrics. The main aim is to detect and analyze the dynamics of the relationships between science policy, R&D developments and research financing. TEXTREND is intended to provide effective tools for a simultaneous analysis of the developments in scientific communication (e.g. research fronts) and in R&D policy (with special emphasis on the trends in research funding). The system is designed to mine the potentially latent relations between these aspects, supporting the decision making process w.r.t. the R&D sector.
The application utilizes the concept of "domain analysis", which is a combination of knowledge discovery techniques both from the fields of structured and unstructured information mining. The former is typically instantiated by the mining of scholarly (ISI WoS, Scopus etc.), or funding databases (e.g. CORDIS), while the latter includes the analysis of full text corpora harvested from the scholarly web. The use case developed for TEXTREND is to integrate these methods in a complementary manner, to gain a multi-faceted though well-integrated picture informed by many sources and methods (social network analysis, topic modelling and tracking, bibliometrics etc.). The prototipical components of this approach, ranging from structured to highly unstructured data processing, are demonstrated in the table below.
The toolkit is to allow the user to conduct sectional, comparative and longitudinal or dynamics-centered analyses as well. Emphasis is placed on the concept of informative visualization, i.e. the output is intended to be readily interpretable and conveying a rich set of information for the analyst ("visual analytics"). Types of the visualization are demonstrated below, based on the technical method of analysis. (The particular figures belong to an ongoing research addressing a trendy topic in supraindividual biology, viz. phenotype plasticity.)
The main goal of the UC is to increase the effectiveness of reviewing and analyzing economic and social topics for analysts, policy makers and all the other stakeholders. The application is designed to support decision making in the fields of economic and public policies both in the governmental and the private sector.
The use case sets up a framework in which on-line content of popular and scholarly journals, and that of think tanks are harvested and filtered yielding documents relevant to the topic under study. These corpuses are then subjected to information extraction with state-of-the art natural language processing techniques, including named entity recognition, keyword-extraction, etc. Combined with a diverse set of metadata, the interrelations and dynamics of topics are to be tracked and analyzed.
The use case provides an excellent service for the joint research conducted by the UC owner GVI and the Corvinus University of Budapest addressing the phenomenon of corruption reported in the national media. The figures below demonstrate some of the results that illustrate the text mining approach.
In recent years, developments in on-line communication have fundamentally reshaped
Beyond searching and navigation, the application is designed to serve the representation, analysis and confrontation of opinions that are present in the blogosphere. For the desired topic of interest, the application is to be capable of structuring and classifying the related textual information. Among the main goals is to implement sentiment analysis in blogs, i.e. the evaluation of the opinions in terms of attitudes (positive, negative, neutral) with respect to some topic, as well as to provide customizable representations of the relationships between the queried contents.
The application is utilized in three areas of use: (1) an on-line service that is capable of searching blogs, highlighting evaluations and opinions w.r.t. a queried topic. (2) Expert service to meet custom requirements, and provides analysis based upon the software implementation (analysis of the discourse about political questions, trends of the public opinion on particular products, etc.) (3) Blog marketing service (for blog service providers), including the detection of target groups interested in a particular topic or the utilization of the potential business opportunities induced by the interest in those topics.