Search result diversification aims to retrieve diverse results to satisfy as many different information needs as possible. Supervised methods have been proposed recently to learn ranking functions and they have been shown to produce superior results to unsupervised methods. However, these methods use implicit approaches based on the principle of Maximal Marginal Relevance (MMR). In this paper, we propose a learning framework for explicit result diversification where subtopics are explicitly modeled. Based on the information contained in the sequence of selected documents, we use attention mechanism to capture the subtopics to be focused on while selecting the next document, which naturally fits our task of document selection for diversification. As a preliminary attempt, we employ recurrent neural networks and max pooling to instantiate the framework. We use both distributed representations and traditional relevance features to model documents in the implementation. The framework is flexible to model query intent in either a flat list or a hierarchy. Experimental results show that the proposed method significantly outperforms all the existing search result diversification approaches.
• Traditional approaches to search result diversification are usually unsupervised and adopt manually defined functions with empirically tuned parameters. Depending on whether the underlying intents (or subtopics) are explicitly modeled, they can be categorized into implicit and explicit approaches.
• Implicit approaches do not model intents explicitly. They emphasize novelty, i.e. the following document should be “different” from the former ones based on some similarity measures.
• Instead, explicit approaches model intents (or subtopics) explicitly. They aim to improve intent coverage, i.e. the following document should cover the intents not satisfied by previous ones. Intents or subtopics can be determined by techniques such as query reformulation and query clustering based on query logs and other types of information. Furthermore, most similarity measures used in the implicit approaches, e.g., those based on language model or vector space model, are determined globally on the whole documents, regardless of possible search intents.
• It does not provide sufficient results.
• We propose a general learning framework DSSA to model subtopics explicitly for search result diversification. Based on the sequence of selected documents, unequal and varied subtopic attention is calculated, driving the model to emphasize different subtopics at different positions.
• This is the first time that attention mechanism is used to model the process.
• We further instantiate DSSA using RNN and max-pooling to handle both distributed representations and relevance features, which outperforms significantly the existing approaches
• The proposed model contains a number of parameters to be learned. This requires a large number of training data. Collecting more training data to fully unlock the potential of the model is another direction.
• Finally, this work only deal with the learning of a ranking function, assuming that subtopics have been obtained in advance and document and query representations have already been created.
• A List-pairwise Approach For Optimization
System : Pentium IV 2.4 GHz
Hard Disk : 40 GB
Monitor : 14’ Colour Monitor
Mouse : Optical Mouse
Operating system : Windows 7.
Coding Language : ASP.Net with C# (Service Pack 1)
Data Base : SQL Server 2008
R. L. Santos, J. Peng, C. Macdonald, and I. Ounis, “Explicit search result diversification through sub-queries,” in ECIR. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 87–99.