Auto Classification / Text Categorization

This reading list was compiled during 2004-2005. There are also some links to TC resources that I will update later.

  1. Allwein, E. L., Schapire, R. E., & Singer, Y. (2001). Reducing multiclass to binary: a unifying approach for margin classifiers. The Journal of Machine Learning Research, 1, 113-141.
  2. Baker, L. D., & McCallum, A. K. (1998). Distributional clustering of words for text classification. Paper presented at the Annual ACM Conference on Research and Development in Information Retrieval, Melbourne, Australia.
  3. Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29.
  4. Bilenko, M., Basu, S., & Mooney, R. J. (2004). Integrating constraints and metric learning in semi-supervised clustering. Paper presented at the Twenty-first international conference on Machine learning, Banff, Alberta, Canada.
  5. Cohen, W. W., & Singer, Y. (1999). Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems (TOIS), 17(2), 141-173.
  6. Collobert, R., & Bengio, S. (2001). SVMTorch: support vector machines for large-scale regression problems. The Journal of Machine Learning Research, 1, 143-160.
  7. Cristianini, N., Shawe-Taylor, J., & Lodhi, H. (2002). Latent Semantic Kernels. Journal of Intelligent Information Systems, 18(2-3), 127-147.
  8. Daelemans, W., Berck, P., & Gillis, S. (1996). Unsupervised discovery of phonological categories through supervised learning of morphological rules. Paper presented at the International Conference On Computational Linguisitics, Copenhagen, Denmark.
  9. Domingos, P., & Pazzani, M. (1997). On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29(2-3), 103-130.
  10. Dutton, D. M., & Conroy, G. V. (1997). A review of machine learning. The Knowledge Engineering Review, 12(4), 341-367.
  11. Dy, J. G., & Brodley, C. E. (2004). Feature Selection for Unsupervised Learning. The Journal of Machine Learning Research, 5, 845-889.
  12. Elder, J. F. (1994). A review of Machine Learning, Neural and Statistical Classification. Journal of the American Statistical Association, 91(433), 436-437.
  13. Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian Network Classifiers. Machine Learning, 29(2-3), 131-163.
  14. Fung, G., & Mangasarian, O. L. (2001). Proximal support vector machine classifiers. Paper presented at the Conference on Knowledge Discovery in Data, San Francisco, California.
  15. Greiner, R., Silver, B., Becker, S., & Grüninger, M. (1988). A Review of Machine Learning at AAAI-87. Machine Learning, 3(1), 79-92.
  16. Herbrich, R., Graepel, T., & Campbell, C. (2001). Bayes point machines. The Journal of Machine Learning Research, 1, 245-279.
  17. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264-323.
  18. Johnson, F. C. (1994). A classification of ellipsis based on a corpus of information seeking dialogues. Information Processing & Management, 30(3), 315-325.
  19. Kalton, A., Langley, P., Wagstaff, K., & Yoo, J. (2001). Generalized clustering, supervised learning, and data assignment. Paper presented at the Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California.
  20. Kehagias, A., Petridis, V., Kaburlasos, V. G., & Fragkou, P. (2003). A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms. Journal of Intelligent Information Systems, 21(3), 227-247.
  21. Lee, S. S., Shishibori, M., Sumitomo, T., & Aoe, J.-I. (2002). Extraction of field-coherent passages. Information Processing & Management, 38(2), 173-207.
  22. Lewis, D. D., Schapire, R. E., Callan, J. P., & Papka, R. (1996). Training algorithms for linear text classifiers. Paper presented at the Annual ACM Conference on Research and Development in Information Retrieval, Zurich, Switzerland.
  23. McCallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. Paper presented at the AAAI-98 Workshop on "Learning for Text Categorization".
  24. Meretakis, D., & Wüthrich, B. (1999). Extending naive Bayes classifiers using long itemsets. Paper presented at the Conference on Knowledge Discovery in Data, San Diego, California, United States.
  25. Park, S.-B., & Zhang, B.-T. (2004). Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Information Processing & Management, 40(3), 421-439.
  26. Pavlov, D., Balasubramanyan, R., Dom, B., Kapur, S., & Parikh, J. (2004). Document preprocessing for naive Bayes classification and clustering with mixture of multinomials. Paper presented at the Conference on Knowledge Discovery in Data, Seattle, WA, USA.
  27. J.M. Peña , J.A. Lozano , & P. Larrañaga (2002). Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction. Machine Learning, 47(1), 63-89.
  28. Provost, F., & Fawcett, T. (2001). Robust Classification for Imprecise Environments. Machine Learning, 42(3), 203-231.
  29. Qin, J. (2000). How Classifications Work: Problems and Challenges in an Electronic Age; G.C. Bowker, S.L. Star (Eds), Library Trends, 47(2): 185-340 (Fall 1998), 155 pp. ISSN: 0024-2594, $18.50. Information Processing & Management, 36(2), 331-333.
  30. Rogati, M., & Yang, Y. (2002). High-performing feature selection for text classification. Paper presented at the eleventh international conference on Information and knowledge management, McLean, Virginia, USA.
  31. Ruocco, A. S., & Frieder, O. (1998). Clustering and classification of large document bases in a parallel environment. Journal of the American Society for Information Science, 48(10), 932-943.
  32. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1-47.
  33. Tipping, M. E. (2001). Sparse bayesian learning and the relevance vector machine. The Journal of Machine Learning Research, 1, 211-244.
  34. Tsay, J.-J., & Wang, J.-D. (2004). Improving linear classifier for Chinese text categorization. Information Processing & Management, 40(2), 223-237.
  35. Valentini, G., & Dietterich, T. G. (2004). Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods. The Journal of Machine Learning Research, 5, 725-775.
  36. van Gestel, T., Suykens, J. A. K., Baesens, B., Viaene, S., Vanthienen, J., Dedene, G., et al. (2004). Benchmarking Least Squares Support Vector Machine Classifiers. Machine Learning, 54(1), 5-32.
  37. Yang, Y. (1999). An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval, 1(1-2), 69.
  38. Yang, Y., & Pedersen, J. O. (1997). A comparative Study on Feature Selection in Text Categorization. Paper presented at the Proceedings of the Fourteenth International Conference on Machine Learning.
  39. Zhang, Z. (2004). Weakly-supervised relation classification for information extraction. Paper presented at the conference on Information and knowledge management, Washington, D.C., USA.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License