Extracting Insights from Search Logs with KConnect

We created an application that does search log analysis of search logs containing medical queries. The application uses the KConnect annotation service to annotate the queries, which allows the analysis to be done based on medical concepts as well as the actual terms typed in.

For the examples below, we analysed an extract of the Trip medical search engine query logs (https://www.tripdatabase.com). They contain logs from both anonymised registered users (380 000 query log entries from 2010 to 2015) and unregistered users (916 000 query log entries for a period of one year from January 2014 to February 2015). Below is an example of a single log entry of an unregistered user:

Session ID, Timestamp, Query, Document ID, URL clicked, URL title

P0pqhw45edag, 2014-03-09 07:35:07.443, pregnancy corticosteroids congenital malformations, 5008517, http://www.uktis.org//docs/Corticosteroids.pdf, Corticosteroids

We expand each query in the search log with semantic annotations of the query text using the KConnect annotation service. Concepts found in the text are annotated with their UMLS UID, semantic type, and one of the following semantic classes: Anatomy, Disease, Drug, and Investigation. The example above results in the following annotations:


Once the query logs are analysed, the user is provided with various interactive visualisations:


We now describe two of the available visualisations.

Most searched medical concept

We implemented visualizations based on medical concepts, which are useful because they group terms that are synonyms into a single concept. Moreover, we used the semantic classes related to medical concepts to better group the visualisations. Thus, the visualization tool allows us to submit complex queries such as identifying the most commonly searched Anatomy keywords by dentists (where the latter information comes from data stored for registered users):


Associated concepts

We can also investigate how concepts interact with each other, for example, by visualising the most commonly searched treatments entered with a given keyword based on the query logs. Below we show the most common concepts of type Disease related to the keyword “pregnancy.”


In order to generate this visualisation, we take advantage of the semantic annotations of concepts. Below is shown how, in a query, the keyword pregnancy is related to the treatment “corticosteroids” and the disease “congenital malformations”.


Further languages

We expanded the query annotation to four further languages by simply replacing the KConnect English annotation pipeline used above by the KConnect annotation pipelines available in other languages: French, Hungarian, Swedish, and Czech.


For medical search log analysis, an interactive visualisation interface, using the KConnect annotation service, allows decision makers to get a rapid overview of the queries to a search engine and provides interaction capabilities to allow deeper drilling into the results for further insight.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s