An introduction to the various subtasks in text-based information retrieval. The presentation gives an overview of the methodologies used to search, compare, and classify papers based on text alone. You may want to watch my short introduction to the core concepts of biomedical text mining first: https://youtu.be/NcntH0WYp1M
0:00 Introduction: definition and subtasks
0:39 Ad hoc retrieval: indexing, stemming, automatic query expansion, and subject headings
2:20 Document similarity: bag of words, tf-idf weighting, and cosine similarity
4:00 Document clustering: all-against-all similarity and clustering algorithms
4:44 Document classification: labelled corpus, vector representations, and machine learning
6:07 Active learning: systematic reviews and iterative screening/training