Especialización en Estadística Aplicada

URI permanente para esta colección

http://hdl.handle.net/11371/6011

Examinar

Mostrando 1 - 5 de 19

Evaluación de modelos de DBO5 para el afluente y efluente de un STARD piloto mediante tecnicas de machine learning.
(2023-12-02) Pascal Suárez, Angel Camilo; González Martínez, Edwin Fernando; Salamanca Bernal, Julián Andrés
Contexto. Establecer el mejor modelo predictivo para el parámetro DBO5, mediante el uso de técnicas de Mechine Learning entre la Demanda Quimica de Oxigeno, los Solidos Suspendidos Totales, el Nitrogeno Total y el Fosforo Total de una planta de tratamiento piloto ubicada en la Facultad del Medio Ambiente de la Universidad Distrital Francisco Jose de Caldas. Propósito. Determinar mediante la aplicación de técnicas de machine Learning un modelo de carácter predictivo, el cual ayude a la toma de decisiones en funciones de las diferentes métricas evaluadas de los modelos aplicados. Metodología. Se realizo una limpieza e imputacion de datos faltantes, por la heterogeneidad de los datos se realizo una transformación de la base de datos para una mejor homogeneidad de los mismo; se determinaron dos grupos mediante el Análisis de Componentes Principales, posteriormente se aplicaron los modelos de Regresión lineal multiple y Random Forest, las métricas evaluadas para la determinar el mejor modelo fueron el RMSE, MAPE, R2, COR . Resultados. El mejor modelo RANDOM FOREST aplicado con las variables determinadas con el criterio AKAIKE presento las mejores métricas para el Afluente y Efluente del STARD, dentro de dichas métricas determinas esta el RMSE con un valor de 0.285 y 0.34 respectivamente. Conclusiones. De acuerdo con los resultados, se puede determinar que a pesar de obtener buenas metricas el modelos de regresión lineal, este no cumple los supuestos de normalidad, por lo cual el mejor modelo predictivo fue el Random Forest con mejores métricas y variables del criterio AKAIKE.
Zona gris, un experimento para entrenar un modelo de clasificación a partir de valores extremos.
(Fundación Universitaria Los Libertadores. Sede Bogotá., ) Enriquez Sanchez, Dany Alexander; González Veloza, José John Fredy
In Machine Learning, we often convert supervised regression problems into dichotomous classification problems based on the definition of the target variable, which simplifies decision making. Our hypothesis in this work is that Training a dichotomous classification model using only the extreme values of the target variable, discarding the rest (gray zone), produces better results than using all the data from the development population in the training phase. This could benefit researchers and practitioners in terms of time, savings in computational resources, and possibly better performance in the training phase. Furthermore, this research can serve as a first step to better understand the influence of extreme values on training classification models and open a new field of study. To evaluate this hypothesis, we use a database of the results of the saber pro tests from the year 2019 of the Ministry of Information and Communications Technologies "Open Data". We perform two model training tests: a symmetric scheme that balances the classification values 0 and 1 and an asymmetric scheme that imbalances these values. The best results were obtained when training the model in the range from 0% to 30% of the gray zone using an asymmetric scheme. However, no significant results were observed that supported the hypothesis.
Modelo de pronosticó para la estimación de costos semanales de importación marítima de bases para la producción de lubricantes en Colombia desde las Américas mediante un modelo SARIMA
(Fundación Universitaria Los Libertadores. Sede Bogotá., ) Osorio Castañeda, Cristhian Camilo; Niño Gutiérrez, Sindy Carolina
The main objective of this study is to analyze and predict CIF import prices (Cost, Insurance and Freight) weekly from the bases for the production of lubricants in Colombia from the Americas. It seeks to evaluate historical import price trends and use time series models for forecasting. The SEMMA (Sample, Explore, Modify, Model, Assess) methodology was used for data analysis. The data used were obtained from the Treid platform, which provides information on imports. were explored and transformed the data, and certain characteristics were identified, such as the repetition of dates and the lack of records in some days. Modifications were made to the data, filtering the information and calculating the weekly average of CIF values. A time series with a positive trend was obtained. Based on these analyses, a a SARIMA predictive model eliminating seasonal and non-stationary behavior, for a seasonal series with s periods in time to predict weekly CIF import prices. The results of this study provide a vision of the behavior of the import prices of bases for lubricants in Colombia, which which is of great importance for decision making in the industry. It is concluded that this methodological approach innovative can contribute to a better understanding and management of CIF import prices for the bases for lubricants, making it possible to adjust strategies and increase participation in the national lubricants market.
Diagnóstico de la Población Recicladora Independiente del Municipio de Pasto, a partir de Técnicas de Aprendizaje Supervisado
(Fundación Universitaria Los Libertadores. Sede Bogotá., ) Carlosama Ruales, Yana Stefhania; González Veloza1, José John Fredy
Historically, in Colombia the recycling population has carried out waste recovery activities usable under precarious working conditions and systematic restrictions and prohibitions by the State, which has generated a constant struggle by the Recycling Guild, who have seen in the associated work the only answer to defend your rights. Therefore, it is necessary to understand why almost half of the recycling population of the municipality of Pasto is not associated with a recycling organization, taking into account its advantages, such as being providers of the public cleaning service and thus receiving the usage fee. Therefore, The objective of this research is to identify the socioeconomic conditions of non-associated recyclers that can explain their lack of interest in organizing. For this, the data obtained in the diagnosis of gender of the recycling population of Pasto 2021, various models were trained under the learning techniques supervised selecting the LGBM method (Light Gradient Boosting Machine), for presenting the best metrics of performance in the task of predicting the conditions of the non-associated recycler population; for processing data, data cleaning, transformation of some variables, and simple data imputation were performed. null, finally the training and test data were separated. According to the results produced by the This model would have to start working with recyclers who have been in the trade for the longest years, because for them the recycling is only a subsistence activity and not the basis of its economy.
Análisis con Machine Learning de Peticiones Externas (PQRS) del Servicio Nacional de Aprendizaje - SENA para mitigación de incumplimientos normativos
(Fundación Universitaria Los Libertadores. Sede Bogotá., ) Ayala Alfonso, Yésica Patricia; Durán Ramírez,Julio Mario; González Martínez., Edwin Fernando
The National Learning Service - SENA receives Petitions, Complaints, Claims, Suggestions, Denunciations, Acknowledgments, Congratulations and Guardianship Actions (PQRS) that must be managed to guarantee a timely response to citizens who request the solution to their requirement; additionally, it must ensure the follow-up and compliance with the regulations that regulate the management of PQRS in Colombia, as well as automating processes that are currently carried out manually. For this reason, the purpose of this project is to analyze with Machine Learning models the PQRS received by SENA, which allow mitigating the risk of materializing regulatory breaches and manage to resolve the PQRS in a timely manner for the public. For this, the SEMMA methodology is used, being the more appropriate for the analysis of large databases. It should be noted that language tools were used of Python and R programming to execute and apply the analysis of the PQRS, obtaining conclusive results and satisfactory about the Machine Learning models chosen to predict the possible violation of rights of citizenship; therefore, with these results, it is considered necessary to suggest the implementation of the models executed before SENA.

Examinar

Envíos recientes