Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps

Salazar, Isail; Pertuz, Said; Martínez , Fabio

Segmentación multi-modal de imágenes RGB-D a partir de mapas de apariencia y de profundidad geométrica

dc.creator	Salazar, Isail
dc.creator	Pertuz, Said
dc.creator	Martínez , Fabio
dc.date	2020-05-15
dc.identifier	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1538
dc.identifier	10.22430/22565337.1538
dc.description	Classical image segmentation algorithms exploit the detection of similarities and discontinuities of different visual cues to define and differentiate multiple regions of interest in images. However, due to the high variability and uncertainty of image data, producing accurate results is difficult. In other words, segmentation based just on color is often insufficient for a large percentage of real-life scenes. This work presents a novel multi-modal segmentation strategy that integrates depth and appearance cues from RGB-D images by building a hierarchical region-based representation, i.e., a multi-modal segmentation tree (MM-tree). For this purpose, RGB-D image pairs are represented in a complementary fashion by different segmentation maps. Based on color images, a color segmentation tree (C-tree) is created to obtain segmented and over-segmented maps. From depth images, two independent segmentation maps are derived by computing planar and 3D edge primitives. Then, an iterative region merging process can be used to locally group the previously obtained maps into the MM-tree. Finally, the top emerging MM-tree level coherently integrates the available information from depth and appearance maps. The experiments were conducted using the NYU-Depth V2 RGB-D dataset, which demonstrated the competitive results of our strategy compared to state-of-the-art segmentation methods. Specifically, using test images, our method reached average scores of 0.56 in Segmentation Covering and 2.13 in Variation of Information.	en-US
dc.description	Los algoritmos clásicos de segmentación de imágenes explotan la detección de similitudes y discontinuidades en diferentes señales visuales, para definir regiones de interés en imágenes. Sin embargo, debido a la alta variabilidad e incertidumbre en los datos de imagen, se dificulta generar resultados acertados. En otras palabras, la segmentación basada solo en color a menudo no es suficiente para un gran porcentaje de escenas reales. Este trabajo presenta una nueva estrategia de segmentación multi-modal que integra señales de profundidad y apariencia desde imágenes RGB-D, por medio de una representación jerárquica basada en regiones, es decir, un árbol de segmentación multi-modal (MM-tree). Para ello, la imagen RGB-D es descrita de manera complementaria por diferentes mapas de segmentación. A partir de la imagen de color, se implementa un árbol de segmentación de color (C-tree) para obtener mapas de segmentación y sobre-segmentación. Desde de la imagen de profundidad, se derivan dos mapas de segmentación independientes, los cuales se basan en el cálculo de primitivas de planos y de bordes 3D. Seguidamente, un proceso de fusión jerárquico de regiones permite agrupar de manera local los mapas obtenidos anteriormente en el MM-tree. Por último, el nivel superior emergente del MM-tree integra coherentemente la información disponible en los mapas de profundidad y apariencia. Los experimentos se realizaron con el conjunto de imágenes RGB-D del NYU-Depth V2, evidenciando resultados competitivos, con respecto a los métodos de segmentación del estado del arte. Específicamente, en las imágenes de prueba, se obtuvieron puntajes promedio de 0.56 en la medida de Segmentation Covering y 2.13 en Variation of Information.	es-ES
dc.format	application/pdf
dc.format	text/xml
dc.format	text/html
dc.language	spa
dc.language	eng
dc.publisher	Instituto Tecnológico Metropolitano (ITM)	en-US
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1538/1634
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1538/1669
dc.relation	https://revistas.itm.edu.co/index.php/tecnologicas/article/view/1538/1724
dc.relation	/ref/P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour Detection and Hierarchical Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, May. 2011. https://doi.org/10.1109/TPAMI.2010.161
dc.relation	/ref/X. Wang, Y. Tang, S. Masnou, and L. Chen, “A Global/Local Affinity Graph for Image Segmentation,” IEEE Trans. Image Process., vol. 24, no. 4, pp. 1399–1411, Apr. 2015. https://doi.org/10.1109/TIP.2015.2397313
dc.relation	/ref/J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,” IEEE Trans. Cybern., vol. 43, no. 5, pp. 1318–1334, Oct. 2013. https://doi.org/10.1109/TCYB.2013.2265378
dc.relation	/ref/N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” Comput. Vis. -- ECCV 2012 12th Eur. Conf. Comput. Vis., pp. 746–760, Berlin, 2012. https://doi.org/10.1007/978-3-642-33715-4_54
dc.relation	/ref/X. Ren, L. Bo, and D. Fox, “RGB-(D) scene labeling: Features and algorithms,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012, pp. 2759–2766. https://doi.org/10.1109/CVPR.2012.6247999
dc.relation	/ref/S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from RGB-D images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013, pp. 564–571. https://doi.org/10.1109/CVPR.2013.79
dc.relation	/ref/Z. Li, X. M. Wu, and S. F. Chang, “Segmentation using superpixels: A bipartite graph partitioning approach,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012, pp. 789–796. https://doi.org/10.1109/CVPR.2012.6247750
dc.relation	/ref/R. Nock and F. Nielsen, “Statistical region merging,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1452–1458, Nov. 2004. https://doi.org/10.1109/TPAMI.2004.110
dc.relation	/ref/J. Yang, Z. Gan, K. Li, and C. Hou, “Graph-Based Segmentation for RGB-D Data Using 3-D Geometry Enhanced Superpixels,” IEEE Trans. Cybern., vol. 45, no. 5, pp. 927–940, May 2015. https://doi.org/10.1109/TCYB.2014.2340032
dc.relation	/ref/A. Richtsfeld, T. Mörwald, J. Prankl, M. Zillich, and M. Vincze, “Learning of perceptual grouping for object segmentation on RGB-D data,” J. Vis. Commun. Image Represent., vol. 25, no. 1, pp. 64–73, Jan. 2014. https://doi.org/10.1016/j.jvcir.2013.04.006
dc.relation	/ref/L. Cruz, D. Lucio, and L. Velho, “Kinect and rgbd images: Challenges and applications,” in Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2012 25th SIBGRAPI Conference on, Ouro Preto, 2012, pp. 36–49. https://doi.org/10.1109/SIBGRAPI-T.2012.13
dc.relation	/ref/K. Chen, Y.-K. Lai, and S.-M. Hu, “3D indoor scene modeling from RGB-D data: a survey,” Comput. Vis. Media, vol. 1, no. 4, pp. 267–278, Dec. 2015. https://doi.org/10.1007/s41095-015-0029-x
dc.relation	/ref/D. Lin, G. Chen, D. Cohen-Or, P. A. Heng, and H. Huang, “Cascaded Feature Network for Semantic Segmentation of RGB-D Images,” in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 1320–1328. https://doi.org/10.1109/ICCV.2017.147
dc.relation	/ref/J. McCormac, A. Handa, S. Leutenegger, and A. J. Davison, “SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?,” in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2697–2706. https://doi.org/10.1109/ICCV.2017.292
dc.relation	/ref/W. Wang and U. Neumann, “Depth-aware cnn for rgb-d segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), Switzerland, 2018, pp. 135–150. https://doi.org/10.1007/978-3-030-01252-6_9
dc.relation	/ref/Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation using deep neural networks,” Int. J. Multimed. Inf. Retr., vol. 7, no. 2, pp. 87–93, Jun. 2018. https://doi.org/10.1007/s13735-017-0141-z
dc.relation	/ref/D. Huang, J.-H. Lai, C.-D. Wang, and P. C. Yuen, “Ensembling over-segmentations: From weak evidence to strong segmentation,” Neurocomputing, vol. 207, pp. 416–427, Sep. 2016. https://doi.org/10.1016/j.neucom.2016.05.028
dc.relation	/ref/J. Smisek, M. Jancosek, and T. Pajdla, “3D with Kinect,” in Consumer depth cameras for computer vision, London: Springer, 2013, pp. 3–25. https://doi.org/10.1007/978-1-4471-4640-7_1
dc.relation	/ref/M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Ak, 2008, pp. 1–8. https://doi.org/10.1109/CVPR.2008.4587420
dc.relation	/ref/P. Arbelaez, “Boundary extraction in natural images using ultrametric contour maps,” in Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, New York, 2006, pp. 182. https://doi.org/10.1109/CVPRW.2006.48
dc.relation	/ref/C. Feng, Y. Taguchi, and V. R. Kamat, “Fast plane extraction in organized point clouds using agglomerative hierarchical clustering,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, 2014, pp. 6218–6225. https://doi.org/10.1109/ICRA.2014.6907776
dc.relation	/ref/R. Hulik, M. Spanel, P. Smrz, and Z. Materna, “Continuous plane detection in point-cloud data based on 3D Hough Transform,” J. Vis. Commun. Image Represent., vol. 25, no. 1, pp. 86–97, Jan. 2014. https://doi.org/10.1016/j.jvcir.2013.04.001
dc.relation	/ref/T. H. Kim and K. M. Lee, S. U. Lee, “Learning full pairwise affinities for spectral segmentation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jul. 2013, pp. 1690-1703. https://doi.org/10.1109/TPAMI.2012.237
dc.relation	/ref/P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “From contours to regions: An empirical evaluation,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009, pp. 2294–2301. https://doi.org/10.1109/CVPR.2009.5206707
dc.relation	/ref/R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward Objective Evaluation of Image Segmentation Algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 929–944, Jun. 2007. https://doi.org/10.1109/TPAMI.2007.1046
dc.relation	/ref/M. Meilǎ, “Comparing clusterings: an axiomatic view,” in Proceedings of the 22nd international conference on Machine learning, Aug. 2005, pp. 577–584. https://doi.org/10.1145/1102351.1102424
dc.relation	/ref/A. Goder and V. Filkov, “Consensus clustering algorithms: Comparison and refinement,” in Proceedings of the Meeting on Algorithm Engineering & Expermiments, Jan. 2008, pp. 109–117. http://dl.acm.org/citation.cfm?id=2791204.2791215
dc.rights	Copyright (c) 2020 TecnoLógicas	en-US
dc.rights	http://creativecommons.org/licenses/by-nc-sa/4.0	en-US
dc.source	TecnoLógicas; Vol. 23 No. 48 (2020); 143-161	en-US
dc.source	TecnoLógicas; Vol. 23 Núm. 48 (2020); 143-161	es-ES
dc.source	2256-5337
dc.source	0123-7799
dc.subject	Image segmentation	en-US
dc.subject	over-segmentation	en-US
dc.subject	RGB-D images	en-US
dc.subject	depth information	en-US
dc.subject	multi-modal segmentation	en-US
dc.subject	Segmentación de imágenes	es-ES
dc.subject	sobre-segmentación	es-ES
dc.subject	imágenes RGB-D	es-ES
dc.subject	información de profundidad	es-ES
dc.subject	segmentación multi-modal	es-ES
dc.title	Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps	en-US
dc.title	Segmentación multi-modal de imágenes RGB-D a partir de mapas de apariencia y de profundidad geométrica	es-ES
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	Research Papers	en-US
dc.type	Artículos de investigación	es-ES

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

tecnologia [520]

Mostrar el registro sencillo del ítem