Comparison of Text Summarization Algorithms for Processing Editorials and News in Spanish

López-Trujillo, Sebastián; Torres-Madroñero, María C.

Comparación de algoritmos de resumen de texto para el procesamiento de editoriales y noticias en español

dc.creator	López-Trujillo, Sebastián
dc.creator	Torres-Madroñero, María C.
dc.date	2021-06-11
dc.date.accessioned	2021-08-19T16:21:46Z
dc.date.available	2021-08-19T16:21:46Z
dc.identifier	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1816
dc.identifier	10.22430/22565337.1816
dc.identifier.uri	http://test.repositoriodigital.com:8080/handle/123456789/12072
dc.description	Language is affected not only by grammatical rules but also by the context and socio-cultural differences. Therefore, automatic text summarization, an area of interest in natural language processing (NLP), faces challenges such as identifying essential fragments according to the context and establishing the type of text under analysis. Previous literature has described several automatic summarization methods; however, no studies so far have examined their effectiveness in specific contexts and Spanish texts. In this paper, we compare three automatic summarization algorithms using news articles and editorials in Spanish. The three algorithms are extractive methods that estimate the importance of a phrase or word based on similarity or word frequency metrics. A document database was built with 33 editorials and 27 news articles, and three summaries of each text were manually extracted employing the three algorithms. The algorithms were quantitatively compared using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric. We analyzed the algorithms’ potential to identify the main components of a text. In the case of editorials, the automatic summary should include a problem and the author’s opinion. Regarding news articles, the summary should describe the temporal and spatial characteristics of an event. In terms of word reduction percentage and accuracy, the method based on the similarity matrix produced the best results and can achieve a 70 % reduction in both cases (i.e., news and editorials). However, semantics and context should be incorporated into the algorithms to improve their performance in terms of accuracy and sensitivity.	en-US
dc.description	El lenguaje se ve afectado, no solo por las reglas gramaticales, sino también por el contexto y las diversidades socioculturales, por lo cual, el resumen automático de textos (un área de interés en el procesamiento de lenguaje natural - PLN), enfrenta desafíos como la identificación de fragmentos importantes según el contexto y el tipo de texto analizado. Trabajos anteriores describen diferentes métodos de resúmenes automáticos, sin embargo, no existen estudios sobre su efectividad en contextos específicos y tampoco en textos en español. En este artículo se presenta la comparación de tres algoritmos de resumen automático usando noticias y editoriales en español. Los tres algoritmos son métodos extractivos que buscan estimar la importancia de una frase o palabra a partir de métricas de similitud o frecuencia de palabras. Para esto se construyó una base de datos de documentos donde se incluyeron 33 editoriales y 27 noticias, obteniéndose un resumen manual para cada texto. La comparación de los algoritmos se realizó cuantitativamente, empleando la métrica Recall-Oriented Understudy for Gisting Evaluation. Asimismo, se analizó el potencial de los algoritmos seleccionados para identificar los componentes principales del texto. En el caso de las editoriales, el resumen automático debía incluir un problema y la opinión del autor, mientras que, en las noticias, el resumen debía describir las características temporales y espaciales de un suceso. En términos de porcentaje de reducción de palabras y precisión, el método que permite obtener los mejores resultados, tanto para noticias como para editoriales, es el basado en la matriz de similitud. Este método permite reducir en un 70 % los textos, tanto editoriales como noticiosos. No obstante, es necesario incluir la semántica y el contexto en los algoritmos para mejorar su desempeño en cuanto a precisión y sensibilidad.	es-ES
dc.format	application/pdf
dc.format	application/zip
dc.format	text/xml
dc.format	text/html
dc.language	spa
dc.publisher	Instituto Tecnológico Metropolitano (ITM)	en-US
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1816/2022
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1816/2053
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1816/2054
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1816/2067
dc.relation	/ref/K. R. Chowdhary, “Natural language processing,” en Fundamentals of Artificial Intelligence, New Delhi: Springer, 2020, pp- 603-649. https://doi.org/10.1007/978-81-322-3972-7_19
dc.relation	/ref/A. Cortez Vásquez; H. Vega Huerta; J. Pariona Quispe; A. M. Huayna, “Procesamiento de lenguaje natural”, Revista de Investigación de Sistemas e Informática, vol. 6, no. 2, pp. 45-54, dic. 2009. https://revistasinvestigacion.unmsm.edu.pe/index.php/sistem/article/view/5923
dc.relation	/ref/A. Gelbukh, “Procesamiento de Lenguaje Natural y sus Aplicaciones”, Komputer Sapiens, vol. 1, pp. 6-11, jun. 2010. https://www.gelbukh.com/CV/Publications/2010/Procesamiento%20de%20lenguaje%20natural%20y%20sus%20aplicaciones.pdf
dc.relation	/ref/A. Rivera Arrizabalaga; S. Rivera Velasco, “Origen del lenguaje: un enfoque multidisciplinar”, Ludus Vitalis, vol. 17, no. 31, pp. 103-141, 2009. http://ludus-vitalis.org/ojs/index.php/ludus/article/view/277
dc.relation	/ref/V. Gupta; G. S. Lehal, “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, vol. 1, no. 1, pp. 60-76, Aug. 2009. http://learnpunjabi.org/pdf/gslehal-pap18.pdf
dc.relation	/ref/S. Naqeeb Khan; N. Mohd Nawi; M. Imrona; A. Shahzad; A. Ullah; A. Ur- Rahman, “Opinion Mining Summarization and Automation Process: A Survey”, International Journal on Advanced Science Engineering Information Technology, vol. 8, no. 5, pp. 1836-1844, 2018. http://dx.doi.org/10.18517/ijaseit.8.5.5002
dc.relation	/ref/C. Yew-Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”, In Text summarization branches out, Association for Computational Linguistics, pp. 74-81, 2004. https://www.aclweb.org/anthology/W04-1013.pdf
dc.relation	/ref/Z. Li; Z. Peng; S. Tang; C. Zhang; H. Ma, “Text Summarization Method Based on Double Attention Pointer Network”, IEEE Access, vol. 8, pp. 11279-11288, Jan. 2020. https://doi.org/10.1109/ACCESS.2020.2965575
dc.relation	/ref/M. González Boluda, “Estudio comparativo de traductores automáticos en línea: Systran, reverso y google”, Núcleo, vol. 22, no. 27, pp. 187-216, dic. 2010. http://ve.scielo.org/scielo.php?script=sci_arttext&pid=S0798-97842010000100008
dc.relation	/ref/A. Hernández Castañeda; R. A. García Hernández; Y. Ledeneva; C. E. Millán Hernández, “Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords”, IEEE Access, vol. 8, pp. 49896-49907, Mar. 2020. https://doi.org/10.1109/ACCESS.2020.2980226
dc.relation	/ref/S. Kumar Saha; D. Rao Ch., “Development of a practical system for computerized evaluation of descriptive answers of middle school level students.” Interactive Learning Environments, pp. 1-14, Ago. 2019. https://doi.org/10.1080/10494820.2019.1651743
dc.relation	/ref/J. Rose; C. Lennerholt, “Low-cost text mining as a strategy for qualitative researchers”, Electronic Journal of Business Research Methods, vol. 15, no. 1, pp. 2-16, Apr. 2017. https://www.researchgate.net/publication/315702194_Low_cost_text_mining_as_a_strategy_for_qualitative_researchers
dc.relation	/ref/G. A. Matias Mendoza; Y. Ledeneva; R. A García Hernández, “Detección de ideas principales y composición de resúmenes en inglés, español, portugués y ruso. 60 años de investigación”, Alfaomega Grupo Editor, S.A. 2020. https://www.semanticscholar.org/paper/Detecci%C3%B3n-de-ideas-principales-y-composici%C3%B3n-de-en-Mendoza-Ledeneva/4ae110ed12c30b76a869206092b097605ffc4f56
dc.relation	/ref/M. D. Bustamante-Rodríguez; A. A. Piedrahita-Ospina; I. M. Ramírez-Velásquez, “Modelo para detección automática de errores léxico-sintácticos en textos escritos en español”, TecnoLógicas, vol. 21, no. 42, pp. 199-209, May. 2018. https://doi.org/10.22430/22565337.788
dc.relation	/ref/R. Elbarougy; G. Behery; A. El Khatib, “Extractive Arabic Text Summarization Using Modified PageRank Algorithm”, Egyptian Informatics Journal, vol. 21, no. 2, pp. 73-81, Jul. 2020. https://doi.org/10.1016/j.eij.2019.11.001
dc.relation	/ref/R. Chandra Belwal; S. Rai; A. Gupta. “A new graph-based extractive text summarization using keywords or topic modeling.” Journal of Ambient Intelligence and Humanized Computing, pp. 1-16, Oct. 2020. https://doi.org/10.1007/s12652-020-02591-x
dc.relation	/ref/J. Steinberger; K. Ježek, “Evaluation measures for text summarization”, Computing and Informatics, vol. 28, no. 2, pp. 251–275. Mar. 2009. https://cai.type.sk/content/2009/2/evaluation-measures-for-text-summarization/1726.pdf
dc.relation	/ref/H. Christian; M. Pramodana Agus; D. Suhartono, “Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)”, ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, Dic. 2016. https://doi.org/10.21512/comtech.v7i4.3746
dc.relation	/ref/I. Manterola; A. Diaz de Ilarraza; K. Gojenola; K. Sarasola, “Recursos en euskera para la herramienta NLTK para enseñanza de procesamiento del lenguaje natural.” Procesamiento del Lenguaje Natural, no. 45, pp. 305-306, Sep. 2010. https://www.redalyc.org/pdf/5157/515751745045.pdf
dc.rights	Copyright (c) 2021 TecnoLógicas	en-US
dc.rights	http://creativecommons.org/licenses/by-nc-sa/4.0	en-US
dc.source	TecnoLógicas; Vol. 24 No. 51 (2021); e1816	en-US
dc.source	TecnoLógicas; Vol. 24 Núm. 51 (2021); e1816	es-ES
dc.source	2256-5337
dc.source	0123-7799
dc.subject	Natural language processing	en-US
dc.subject	Recall-Oriented Understudy for Gisting Evaluation	en-US
dc.subject	Text Analysis	en-US
dc.subject	Text Mining	en-US
dc.subject	Automatic Summarization	en-US
dc.subject	Procesamiento de lenguaje natural	es-ES
dc.subject	Recall-Oriented Understudy for Gisting Evaluation	es-ES
dc.subject	análisis de textos	es-ES
dc.subject	minería de textos	es-ES
dc.subject	resumen automático	es-ES
dc.title	Comparison of Text Summarization Algorithms for Processing Editorials and News in Spanish	en-US
dc.title	Comparación de algoritmos de resumen de texto para el procesamiento de editoriales y noticias en español	es-ES
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	Research Papers	en-US
dc.type	Artículos de investigación	es-ES

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

tecnologia [520]

Mostrar el registro sencillo del ítem