Data & Research Projects

Creative Projects

Other Projects

Skills :

Data Science, Data Analysis, Social Psychology Research, Behavioral Research, Python.

Tools :

Google Colab, Python Libraries: Numpy, Pandas, Seaborn, Matlplotlib, Scipy, Sklearn.

Background

To support pro-environmental efforts by the school board & local government, I researched declining rates of students biking to school.

Target

Understanding Biking Behavior and Analyzing Factors Affecting Intention to Use Bike to School among School Students.

What did this research find out?

  • Distance is a key factor in encouraging students, especially girls, to use bikes to get to school. Intention can also play a role, especially for boys.
  • Perceived behavioral control is the most important factor influencing the intention to use a bike to travel to school, for both for boys and girls.
  • Attitudes towards biking to school are heavily influenced by two main beliefs: the perception that biking makes students healthier and how important, enjoyable, and beneficial this is for them, as well as the view that using bicycles consumes more energy that could otherwise be used for activities like studying, along with how important, concerning, and disadvantageous they find this aspect.
  • Subjective norms are greatly influenced by two key beliefs: the extent to which students perceive that their friends would enjoy and approve of biking to school, along with the strength of their desire to meet these expectations, and the extent to which they believe that teachers would also enjoy and expect them to bike to school, coupled with the strength of their desire to comply with these expectations.
  • Two dominant beliefs significantly influence the formation of perceived behavioral control. The first is their belief about whether distance is a facilitating or hindering factor, and how easy or difficult it is to use a bike to get to school considering this factor. The second is their belief about the energy required to use bicycles to school, and how easy or difficult it is to use bicycles considering this factor.

Insights & recommendation:

  • Government might want to apply programs aimed at encouraging prospective students to choose schools closer to their homes as the research shows that distance plays an important role in influencing students to use bike to get to school.
  • More programs implemented by schools, the local government, or other public institutions should focus on persuasive information emphasizing the personal benefits of biking (eg. health impacts) rather than on the more moral or social information (eg. environmental impacts).
  • More programs conducted by schools, the local government, or other public institutions that aim to instill the belief among students that biking to school is socially accepted and favored among teenagers. This can be achieved by considering current teenage culture and leveraging conformity among teenagers and peer relationships.

Skills :

Data Science, Machine Learning, Python.

Tools:

Google Colab, Python Libraries: Numpy, Pandas, Seaborn, Matlplotlib, Scipy, Sklearn.

Background

With the rise of online education, platforms like VideoLectures.Net and MOOCs on Coursera have made thousands of lectures accessible to millions. However, the abundance of content creates challenges in finding and matching videos with learners. This project explores how machine learning can help address this.

Purpose

The purpose of this porject is to predict how engaging an educational video is likely to be for viewers, based on a set of features extracted from the video's transcript, audio track, hosting site, and other sources.

Summary

The preprocessing phase revealed high variability, skewness, and the presence of outliers in key features, though no strong linear relationships with engagement were found. Given these characteristics, tree-based models were chosen for their ability to handle non-linearity and outliers effectively. Yeo-Johnson transformation was applied to address skewness, ensuring better feature distribution for modeling.

Both XGBoost and Random Forest were implemented, with Random Forest achieving a slightly higher optimized AUC-ROC score (0.8959) compared to XGBoost (0.8948). These results suggest that tree-based models are well-suited for this dataset, providing robust predictive performance while handling the data's complexities.

Since random forest performs similarly to XGBoost, it means a simpler model like random forest might be sufficient and a safer choice.

Skills :

Data Visualization, Data Analysis

Tools:

Tableau, Looker Studio

Background

Open Data Jabar is a regional government platform that serves as a primary source of datasets related to regional data and information. I explored the platform to find relevant datasets for my first visualization project on mental health issues.

Purpose

The goal of this project is to use visualization tools to identify potential trends and patterns in patient visits for mental health-related issues at primary healthcare facilities in West Java.

What did this visualization find out?

5 cities/regencies are identified as having the highest percentage of patient visits for mental health related issues from 2021 to 2024:

  • Cirebon City
  • Cirebon Regency
  • Cianjur Regency
  • Bogor City
  • Karawang Regency

5 cities/regencies are identified as having the highest increase in percentage of patient visits for mental health related issues from 2021 to 2024:

  • Sukabumi City
  • Cimahi City
  • Kuningan Regency
  • Ciamis Regency
  • Tasikmalaya City

5 cities/regencies have a higher-than-average percentage of patient visits despite having a below-average number of primary healthcare facilities:

  • Depok City
  • Majalengka Regency
  • Bogor City
  • Pangandaran Regency
  • Cirebon City

Insights:

  • High Mental Health Demand in Certain Areas: cities/regencies like Cirebon City and Cirebon Regency consistently have a significant percentage of patient visits for mental health-related issues. This suggests an ongoing demand for mental health services in these areas.
  • Rising Trends in Mental Health Cases in Certain Areas: cities/regencies that have shown an increasing trend may indicate growing issues related to mental health that need to be reserached and investigated further.
  • >Healthcare Facility Shortages in High-Demand Areas: the shortage of facilities in high demand areas could lead to service overload, longer wait times, and inadequate patient care.

Recommendations:

  • Expand Mental Health Services: high-demand areas like Depok City, Majalengka Regency, and Cirebon City need more primary healthcare facilities with mental health services. Establishing new centers, upgrading existing ones, and deploying mobile clinics can help address service shortages.
  • Leverage Telemedicine and Digital Services: expanding telehealth platforms, online counseling, and mental health hotlines can help people in areas with limited facilities access timely support, reducing the strain on physical healthcare centers.
  • Conduct Further Research in High-Risk Areas: a deeper investigation into regions with consistently high or rising mental health cases is essential to understand the underlying causes. Research should explore multitude of factors such as socioeconomic conditions, cultural perceptions of mental health, accessibility to care, and local stressors. Conducting community surveys, focus group discussions, and collaborating with local healthcare providers can provide valuable insights for targeted interventions.