TN Letter Grades: A Machine Learning Approach

This blog continues the series on TN School Letter Grade data. You can find the first post here. The same dataset applies.

Having looked at some of the preliminary results of statewide school letter-grade data, I wanted to do a machine-learning approach to see the importance of the different features in the data. Typically, this type of approach is used for predictive modeling of a given outcome, but it also gives a lot of insights into the data.

What is Machine Learning?

Machine Learning is an AI approach to data that allows the computer to learn insights about the data to become more accurate at predicting outcomes without being programmed to do so. It spots patterns in data, and the more data it’s exposed to, the better it does. This is why I chose this approach: this type of analysis works best on a large set of data, but when applied to an individual school or school system, it will help with goal setting and interventions. If certain challenges are identified as barriers to higher performance, you can work on targeted interventions to address these issues.

The beauty of machine learning in this context is its ability to handle complex, multifaceted data and reveal insights that might not be immediately apparent through traditional analysis. This can lead to more informed decision-making and, ultimately, better educational outcomes for the students in your school system.

Exploring Machine-Learning Methods

In Machine Learning, the process typically involves considering various algorithms and conducting testing to identify the best approach for the data. In this analysis, I began with a logistic regression model, which initially showed promising results with ROC AUC scores for individual letter grades: A = 0.97, B = 0.82, C = 0.84, D = 0.95, F = 0.99 (see the graph below). However, it's important to note that logistic regression is primarily a binary classifier, and it provided different feature scores for each letter grade individually. While these insights were valuable, I sought a more comprehensive model capable of collectively predicting all letter grades, rather than focusing on each one individually.

ROC AUC Scores for Logistic Regression

Even though I ultimately settled on a different algorithm to look comprehensively at the data, given that the letter grades A and F had such high ROC AUC scores, I thought it would be interesting to look at the coefficients scores for each of those letter grades.

Keep in mind that a positive coefficient for a feature means that as the value of the feature increases, the likelihood or the probability of the predicted outcome also increases. The opposite is true of negative coefficients. As the value of the feature decreases, the likelihood or probability of the predicted outcome decreases. Also, the magnitude of the coefficients matter. Larger coefficients, whether positive or negative, imply a stronger influence of the corresponding feature on the outcome.

Class 0 (Letter grade of A):
overall_success_rate_all_students: 4.337198187954319
growth_numeracy_score: 2.2796755844815197
growth_literacy_score: 1.685213811073859
growth_social_studies_score: 1.6173875332744483
growth_science_score: 1.5986916515324763
economically_disadvantaged_pct: -1.3345206477116285
limited_english_proficient_pct: -1.1511514311464113
overall_success_rate_ed: 1.0341733483583349
growth_ela_math_score_bhn: 0.8022573064964893
growth_ela_math_score_ed: 0.7036926608187116
growth_ela_math_score_swd: 0.42283743131419366
black_hispanic_native_american_pct: -0.40066877214648894
homeless_pct: -0.4000903731301108
overall_success_rate_el: 0.34087624502095853
military_pct: -0.33136098524287755
overall_success_rate_swd: -0.2607763057899938
african_american_pct: -0.20352688080879153
asian_pct: -0.15963467577880144
multirace_pct: 0.09164345297812754
native_american_pct: 0.08977420050427559
growth_ela_math_score_el: -0.07558107837346213
white_pct: -0.06735814037053171
students_with_disabilities_pct: -0.022687826552016045
male_pct: 0.00627481315433192
migrant_pct: -0.0007504063512127447
Class 4 (Letter grade of F):
overall_success_rate_all_students: -4.321832443606035
overall_success_rate_ed: -2.1215531229812146
growth_science_score: -1.7027841781095818
growth_literacy_score: -1.6491294242431007
growth_numeracy_score: -1.6129511803548933
growth_ela_math_score_bhn: -1.22590960402722
growth_social_studies_score: -1.178614571621578
economically_disadvantaged_pct: 1.1473369934362392
growth_ela_math_score_ed: -0.7225782040444474
limited_english_proficient_pct: 0.4933486617717649
students_with_disabilities_pct: -0.46909851837241345
asian_pct: 0.34961287513000733
homeless_pct: 0.3360394684863425
growth_ela_math_score_swd: -0.3165323640351417
overall_success_rate_swd: 0.3134225518072883
black_hispanic_native_american_pct: 0.28520884553047776
male_pct: -0.13205589211946425
white_pct: -0.12722495610537785
military_pct: -0.09357110355603143
multirace_pct: 0.06537517896094194
growth_ela_math_score_el: 0.060166568067282857
overall_success_rate_el: 0.04748262863384597
african_american_pct: 0.004185611416037893
native_american_pct: 0.0030283931838529874
migrant_pct: -0.0029266625919092625

As you can see, the best predictors for a letter grade of A are overall_success_rate_all_students (4.3372), growth_numeracy_score, (2.2797), growth_literacy_score (1.6852), growth_social_studies_score (1.6174), and growth_science_score (1.5987). Of course, success rate for all students is 50% of the letter grade score and the overall growth score is 40% of the letter grade score. It isn’t surprising to see these in the top 5.

Looking at negative scores can be telling as well. The negative scores with the greatest magnitude for the letter grade of A were economically_disadvantaged_pct (-1.3345) and limited_english_proficient_pct: (-1.1511). This insinuates that having a lower percentage of economically disadvantaged students and students with limited english proficiency were important to scoring a letter grade of A.

For a letter grade of F, we examined the coefficients to identify the most influential factors. The results shed light on the key determinants of a low letter grade. Just as with the letter grade A analysis, we discovered both positive and negative contributors.

The most prominent positive predictor for a letter grade of F was overall_success_rate_all_students, with a coefficient of -4.3218. This indicates that a low overall success rate for all students strongly correlates with a letter grade of F. Additionally, several growth scores had negative coefficients, including growth_science_score (-1.7028), growth_literacy_score (-1.6491), and growth_numeracy_score (-1.6130). These findings imply that poor performance in these growth areas negatively affects the letter grade.

On the other hand, certain factors with positive coefficients slightly mitigated the impact of negative predictors. For instance, economically_disadvantaged_pct had a positive coefficient of 1.1473, suggesting that a lower percentage of economically disadvantaged students was associated with a slightly better letter grade. Limited_english_proficient_pct had a positive coefficient of 0.4933, indicating that a lower percentage of students with limited English proficiency had a positive influence on the letter grade.

I was also curious about false negatives and false positives with logistic regression. The confusion matrix for the logistic regression showed that the model most accurately predicted schools that scored a D at a very high rate (97.5%) compared to the other letter grades. Here's the percentage of correct predictions for each class:

  • Class A: Approximately 81% correct

  • Class B: Roughly 66% correct

  • Class C: About 80% correct

  • Class D: Approximately 97.5% correct, which indicates a high accuracy for this class

  • Class F: Around 61.5% correct

While these are promising results for predicting an A, C, and a D, it does not predict the others with accuracy, which throws off the entire model as a reliable method for looking at this data. This is why other machine learning approaches needed to be explored.

Logistic Regression Confusion Matrix

Finding a better algorithm

To find an algorithm that would predict the letter grade rather than look at each of them individually, I decided to look at the following algorithms: Decision Tree, Random Forest, Gradient Boosting, and Support Vector Machine. I used the same 70/30 training/testing set that was used for Logistic Regression, and I had it compute the Accuracy, ROC AUC, Precision, Recall, and F1 Score.

Here are the results:

Decision Tree showed an accuracy of 66.5%. The model had a Precision of 66.6%, closely matching its accuracy. Recall and F1 Score were both approximately 66.5%, indicating a balanced performance in terms of precision and recall. The ROC AUC scores were strong across the classes, with the highest being for class 4 (98.9%).

Random Forest performed better in terms of accuracy with 76.6%. Precision was notably higher at 77.3%, with a Recall of 76.6% and an F1 Score close behind at 76.3%. The ROC AUC values mirrored those of the Decision Tree, which suggests consistent performance across different thresholds.

Gradient Boosting edged out with an accuracy of 77.4%, the highest among the tested models. It also had the highest Precision at 77.5% and F1 Score at 77.3%. Recall was in line with accuracy at 77.4%. ROC AUC scores were consistent with the other models.

Support Vector Machine had an accuracy of 74.4%. It achieved a Precision of 75.1% and an F1 Score of 74.1%, with a Recall of 74.4%. ROC AUC scores for this model were also similar to the others.

Overall, Gradient Boosting stood out as the most accurate model for this task. Despite the similarity in ROC AUC scores across the models, I considered the balance between all metrics. Gradient Boosting showed the best balance, with the highest scores in Precision, Recall, and F1 Score, indicating its strength in both classifying correctly and maintaining a balance between false positives and false negatives. This balance is crucial for models where both types of errors carry significant weight, such as predicting school grades.

  • Accuracy: Approximately 77.4%, which signifies the proportion of total correct predictions made out of all predictions.

  • ROC AUC: About 93.9%, reflecting the model's ability to distinguish between the classes across different thresholds.

  • Precision: Roughly 77.5%, indicating the model's accuracy when predicting a positive class.

  • Recall: Also about 77.4%, showing the model's capability to identify all actual positives.

  • F1 Score: Approximately 77.3%, which is a harmonic mean of Precision and Recall, providing a single score to measure the model's accuracy.

Gradient Boosting Results

Using the Gradient Boosting algorithm, I did a chart of the top 10 features.

Gradient Boosting top 10 features

Top 10 Features

  1. Overall Success Rate for All Students (Feature importance: 42.91%): This is the most significant predictor, indicating that the overall success rate is strongly associated with the school's letter grade.

  2. Growth in Numeracy Score (14.49%): The second most important feature, which suggests that improvements in numeracy significantly influence the grade.

  3. Growth in ELA and Math Score for ED (6.98%): The progress in English Language Arts and Mathematics for economically disadvantaged students is also a key indicator.

  4. Growth in Science Score (6.40%): Science score growth is another substantial factor.

  5. Growth in Literacy Score (5.68%): Literacy improvements are crucial, although less so than numeracy.

  6. Growth in ELA and Math Score for BHN (5.43%): The growth in ELA and Math for Black, Hispanic, and Native American students is also a notable predictor.

  7. Growth in Social Studies Score (4.39%): This shows a moderate influence on the grade.

  8. Percentage of Economically Disadvantaged Students (2.76%): While this has a smaller weight, it's still a relevant feature.

  9. Percentage of Students with Disabilities (1.93%): This has a lesser impact but is part of the top 10 features.

  10. Overall Success Rate for ED (1.52%): The overall success rate for economically disadvantaged students rounds out the top 10 features.

Conclusions

It turned out to be a happy accident that I ran the logistic regression first because it allows me to look at the coefficients for features for individual grades before I looked at them as coefficients to predict any grade. Doing this really shows that the beyond common sense that overall success rate is going to determine the letter grade, how ED students score on their ELA and math growth score and how BHN students score on their ELA and math growth scores influences the scores. Additionally, how students scored in science and social studies, two subjects no longer included in federal accountability, were also important to the overall letter-grade score. It will be interesting to look at how these coefficients differ when looking at federal accountability that takes improvement and subgroup performance into account when assignment a school score.

Disclaimer

Keep in mind that these scores are literally from one test for each subject and both achievement and growth scores were derived from the one test. This is not the most accurate measure of a student’s knowledge even if that is what is used for accountability for political expediency and not for actually data reliability.

Finally, this or any other analysis isn’t a substitution for doing the right things for students. Building relationships with students, teaching the things that matter in every subject, and helping students develop a love and desire for learning are always going to produce the best results for students no matter what scoring apparatus is used. As The Fox said in the Little Prince: “On ne voit bien qu’avec le coeur; l’essentiel est invisible pour les yeux.”

I had ChatGPT check this for errors and make some writing suggestions.