Visualization and Multiclass Classification of Complaints to Official Organisms on Twitter

Hernández-Pajares, Beatriz; Pérez-Marín, Diana; Frías-Martínez, Vanessa

Clasificación multiclase y visualización de quejas de organismos oficiales en twitter

dc.creator	Hernández-Pajares, Beatriz
dc.creator	Pérez-Marín, Diana
dc.creator	Frías-Martínez, Vanessa
dc.date	2020-01-30
dc.identifier	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1454
dc.identifier	10.22430/22565337.1454
dc.description	Social networks generate massive amounts of information. Current Natural Language techniques allow the automatic processing of that information, and Data Mining enables the automatic extraction of useful info. However, a state-of-the-art review reveals that many classification methods only distinguish two classes. This paper presents a procedure to automatically classify tweets into several classes (more than two). The steps of the procedure are described in detail so that any researcher can follow them. The accuracy and coverage (instead of only coverage as usual in the literature) of two automatic classifiers (SVM and Random Forests) were analyzed in a comparative study. The procedure was applied to automatically identify more than two types of complaint from 190,000 tweets. According to the results, Random Forests should be used because they achieve an average accuracy of 81.46 % and an average coverage of 59.88 %.	en-US
dc.description	Las redes sociales acumulan gran cantidad de información. Las actuales técnicas de Procesamiento de Lenguaje Natural permiten su procesamiento automático y las técnicas de Minería de Datos permiten extraer datos útiles a partir de la información recopilada y procesada. Sin embargo, de la revisión del estado del arte, se observa que la mayoría de los métodos de clasificación de los datos identificados y extraídos de redes sociales son biclase. Esto no es suficiente para algunas áreas de clasificación, en las que hay más de dos clases a considerar. En este artículo, se aporta un estudio comparativo de los métodos svm y Random Forests, para la identificación automática de n-clases en microblogging de redes sociales. Los datos recopilados automáticamente para el estudio están conformados por 190 000 tweets de cuatro organismos oficiales: Metro, Protección Civil, Policía, y Gobierno de México. De los resultados obtenidos, se recomienda el uso de Random Forests, ya que se consigue una precisión media del 81.46 % y una cobertura media del 59.88 %, con nueve tipos de quejas identificadas automáticamente.	es-ES
dc.format	application/pdf
dc.format	text/html
dc.format	text/xml
dc.language	spa
dc.publisher	Instituto Tecnológico Metropolitano (ITM)	en-US
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1454/1521
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1454/1607
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1454/1588
dc.relation	/ref/S. Galeano, “Cuáles son las redes sociales con más usuarios del mundo (2019),” M4rketing Ecommerce, 2019. Disponible en: https://marketing4ecommerce.net/cuales-redes-sociales-con-mas-usuarios-mundo-2019-top/, [Accedido: 27-Jan-2020].
dc.relation	/ref/K. Smith, “44 estadísticas de Twitter,” Brandwatch, 2016. Disponible en: URL [Accedido: 27-Jan-2020].
dc.relation	/ref/C. D. Manning y H. Schiitze, Foundations of Statistical Natural Language Processing: Massachusetts Institute of Technology: MIT Press. Cambridge, 1999. Disponible en: https://www.cs.vassar.edu/~cs366/docs/Manning_Schuetze_StatisticalNLP.pdf
dc.relation	/ref/M. Vallez y R. Pedraza-Jimenez, “El Procesamiento del Lenguaje Natural en la Recuperación de Información Textual y áreas afines,” Hipertext.net, vol. 5, 2007. Disponible en: https://www.raco.cat/index.php/Hipertext/article/view/59496
dc.relation	/ref/tf-idf, “What does tf-idf mean?”. Disponible en: http://www.tfidf.com/. [Accedido: 27-Jan-2020].
dc.relation	/ref/C. C. Aggarwa y C. Zhai, Mining Text Data: Boston, MA: Springer US, 2012. https://doi.org/10.1007/978-1-4614-3223-4
dc.relation	/ref/Z. Malkani y E. Gillie, “Supervised Multi-Class Classification of Tweets,” pp. 1–6, Dec. 2012. Disponible en: https://pdfs.semanticscholar.org/bc78/1a147a3fe8477ade06ccf22a3aabe12236ea.pdf
dc.relation	/ref/Twitter, “What The Trend,” 2009. Disponible en: https://twitter.com/whatthetrend
dc.relation	/ref/K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal, y A. Choudhary, “Twitter Trending Topic Classification,” en 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver 2011. pp. 251–258. https://doi.org/10.1109/ICDMW.2011.171
dc.relation	/ref/Y. Zhu, X. Shen, y W. Pan, “Network-based support vector machine for classification of microarray samples,” BMC Bioinformatics, vol. 10, no S21, Jan. 2009. https://doi.org/10.1186/1471-2105-10-S1-S21
dc.relation	/ref/J. Ramos, “Using tf-idf to determine word relevance in document queries,” en Proceedings of the first instructional conference on machine learning, Piscataway, 2003, pp. 133–142. Disponible en: https://0bc297c6-a-62cb3a1a-s-sites.googlegroups.com/site/caonmsu/ir/UsingTFIDFtoDetermineWordRelevanceinDocumentQueries.pdf?attachauth=ANoY7cqkto1wDdp6Jn46PedfG7tGhGuYmcCduJwLGMhNpvI-5c7t18UboKTmHi_pT-azS_yYTWmZIytOQSEh56v29LLcG8vrrTwNbjXg0c49O-oE2ZpJail3QOfHci1bk-m4oDISHj2AZ9IdBIB3s5Vklxd06ZGZbf-tg3HMDWG3WVoyAEAOR7CU6UQuvJdm1rye6v1KH4fEF29zCvfMigps7R31YDkTepj8GZWeuOUX77R_nUX4E32OeQklG26umoedBM08ee-HmZIm0RNzHg76DslSGl-eiA%3D%3D&attredirects=0
dc.relation	/ref/I. Rish, “An empirical study of the naive Bayes classifier,” en IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, pp. 41–46. Disponible en: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.330.2788
dc.relation	/ref/E. Anguiano-Hernández, Naive Bayes Multinomial para clasificación de texto usando un esquema de pesado por clases, pp.1-8, Apr. 2009. Disponible en: http://ccc.inaoep.mx/~esucar/Clases-mgp/Proyectos/MGP_RepProy_Abr_29.pdf
dc.relation	/ref/N. Cristianini y J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge: University Press, 2000. https://doi.org/10.1017/CBO9780511801389
dc.relation	/ref/RuleQuest Research “About us,” 2018. Disponible en: https://rulequest.com/about-us.html. [Accedido: 21-Sep-2019].
dc.relation	/ref/B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, y M. Demirbas, “Short text classification in twitter to improve information filtering,” en Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’10, Geneva, 2010, pp. 841–842. https://doi.org/10.1145/1835449.1835643
dc.relation	/ref/J. Nazura y B. L. Muralidhara, “Semantic classification of tweets: A contextual knowledge based approach for tweet classification,” en 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, 2017, pp.1-6. https://doi.org/10.1109/IISA.2017.8316358
dc.relation	/ref/P. Selvaperumal y A. Suruliandi, “A short message classification algorithm for tweet classification,” en 2014 International Conference on Recent Trends in Information Technology, Chennai, 2014. pp. 1–3. https://doi.org/10.1109/ICRTIT.2014.6996189
dc.relation	/ref/R. C. Balabantaray, M. Mohammad, y N. Sharma, “Multi-Class Twitter Emotion Classification: A New Approach,” Int. J. Appl. Inf. Syst., vol. 4, no. 1, pp. 48–53, Sep. 2012. https://doi.org/10.5120/ijais12-450651
dc.relation	/ref/E. D’Andrea, P. Ducange, A. Bechini, A. Renda, y F. Marcelloni, “Monitoring the public opinion about the vaccination topic from tweets analysis,” Expert Syst. Appl., vol. 116, pp. 209–226, Feb. 2019. https://doi.org/10.1016/j.eswa.2018.09.009
dc.relation	/ref/M. Habdank, N. Rodehutskors, y R. Koch, “Relevancy assessment of tweets using supervised learning techniques: Mining emergency related tweets for automated relevancy classification,” en 2017 4th International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), Münster, 2017, pp. 1–8. https://doi.org/10.1109/ICT-DM.2017.8275670
dc.relation	/ref/J. F. Franco-Bermúdez y W. L. Ruiz-Castañeda, “Análisis de redes sociales para un sistema de innovación generado a partir de un modelo de simulación basado en agentes,” TecnoLógicas, vol. 22, no. 44, pp. 21–44, Jan. 2019. https://doi.org/10.22430/22565337.1183
dc.relation	/ref/R. S. Ghaly, E. Elabd, y M. A. Mostafa, “Tweets classification, hashtags suggestion and tweets linking in social semantic web,” en 2016 SAI Computing Conference (SAI), London, 2016. pp. 1140–1146. https://doi.org/10.1109/SAI.2016.7556121
dc.relation	/ref/E. Yar, I. Delibalta, L. Baruh, y S. S. Kozat, “Online text classification for real life tweet analysis,” en 2016 24th Signal Processing and Communication Application Conference (SIU), Zonguldak, 2016. pp. 1609–1612. https://doi.org/10.1109/SIU.2016.7496063
dc.relation	/ref/J. M. Rodriguez, D. Godoy, C. Mateos, y A. Zunino, “A multi-core computing approach for large-scale multi-label classification,” Intell. Data Anal., vol. 21, no. 2, pp. 329–352, Mar. 2017. https://doi.org/10.3233/IDA-150375
dc.relation	/ref/Twitter4J.org, “Overview”. Disponible en: http://twitter4j.org/javadoc/index.html
dc.relation	/ref/R. Longadge, S. Dongre y L. Malik, “Class Imbalance Problem in Data Mining Review,” Int. J. Comput. Sci. Netw., vol. 2, no. 1, pp. 83–87, May, 2013. Disponible en: http://journaldatabase.info/articles/class_imbalance_problem_data_mining.html
dc.relation	/ref/B. Hernández-Pajares, “Clasificación Automática Multiclase de Tweets y su Representación Gráfica,”(Tesis de Maestría), Facultad de ingeniería, Madrid, Universidad Rey Juan Carlos, 2013. Disponible en: https://eciencia.urjc.es/handle/10115/11914
dc.relation	/ref/B. Hernández-Pajares, D. Pérez-Marín y V. Frías-Martínez, “TFM_code”, 2013. Disponible en: https://tinyurl.com/y4mnwotv.
dc.rights	Copyright (c) 2020 TecnoLógicas	en-US
dc.rights	http://creativecommons.org/licenses/by-nc-sa/4.0	en-US
dc.source	TecnoLógicas; Vol. 23 No. 47 (2020); 109-120	en-US
dc.source	TecnoLógicas; Vol. 23 Núm. 47 (2020); 109-120	es-ES
dc.source	2256-5337
dc.source	0123-7799
dc.subject	Text Mining	en-US
dc.subject	Multiclass Classification	en-US
dc.subject	Social Networks	en-US
dc.subject	Twitter	en-US
dc.subject	Minería de texto	es-ES
dc.subject	clasificación multiclase	es-ES
dc.subject	redes sociales	es-ES
dc.subject	Twitter	es-ES
dc.title	Visualization and Multiclass Classification of Complaints to Official Organisms on Twitter	en-US
dc.title	Clasificación multiclase y visualización de quejas de organismos oficiales en twitter	es-ES
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	Research Papers	en-US
dc.type	Artículos de investigación	es-ES

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

tecnologia [520]

Mostrar el registro sencillo del ítem