Notes from Learning at Scale - Day 2
Scaling expert feedback
Question: Why is a college endorsement valuable?
Gradescope: a fast, flexible, and fair system for scalable assessment of handwritten work
Grading parties at CS 50 at berkeley: too long.
Their system Gradescope has been used by:
- 250 schools
- 3000 courses
- 18M pages
They scanned 1400 exams in 2 hours xD
- Does it make grading more fair? — Strongly agree.
- Does it make grading more enjoyable? — Agree
Awesome grading tool @Gradescope:
— Jill-Jênn Vie (@jjvie) 21 avril 2017
<Me> I told you. @UCBerkeley is the future.
<Advisor> No. It's the present. We are prehistory. #las17ed pic.twitter.com/C1DrbFlTvs
Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis
The goal is to correct many programming assignments at the same time, through test cases and teacher hints.
Program synthesis relies in learning code transformations from pairs of incorrect and correct submissions. (Kind of regexp between syntactic trees?)
- They diff between correct and incorrect submissions
- Cluster the patches
- Ask the teacher to rate them
- Propagate to other submissions
Students can receive patches to their solution labelled with hints.
(Not tested yet on large-scale classes?)
Preventing Keystroke Based Identification in Open Data Sets
University of Helsinki.
They record the timestamps of keystrokes, and would like to open the data anonymously. Unfortunately, this is enough to recover the identify of the Top 10. (Creep.)
Idea: bucketing or roughly rounding the timestamps according to some time window.
Surprise: short time window anonymizes; average time window does NOT anonymize; big time window anonymizes again.
Hypothesis: the peak distribution over delay between keystrokes may explain this behavior. A middle threshold of time window kills the blur.
Do performance trends suggest wide-spread collaborative cheating on asynchronous exams?
University of Illinois have a Computer-Based Testing Facility where students can spend exams asynchronously (over 4 days).
Problems are randomly parametrized in the exercises, but it might not be enough to prevent cheating.
Collaborative cheating may still be possible.
Dataset
- 93 exams
-
29492 exam records
- Most of students choose the last day for exam
- But their performance is lower in average
Creative learning
- Importance to provide fab labs and so on for teaching CS & robotics at scale.
- Importance to use technology to reach people that have less opportunities to learn programming
- Suggesting people to go to a library near them
Challenges
- Being open to criticism
- Who is personalizing? An AI?
Towards equal opportunities in MOOCs: Reducing gender & social-class achievement gaps in China with a value relevance affirmation
SIT matters in China and in language learning contexts.
Goal: Trying to reduce achievement gaps in an English language learning MOOC in China, offered by Tsinghua University.
- 5M registered on XuetangX, made on Open edX
Investigation
- Gender identity threat
- Social-class identity threat
Lower-class men < High-class men < Lower-class women < High-class women
- Men experience more gender identity threat
- No difference in regional identity threat between upper-class
They managed to cut in half the gender gap, using solely a low-cost psychological intervention (under a form).
#las17ed Are gender and social classes impacting on learner completion
— Ella Hamonic (@Ella_Hmc) 21 avril 2017
Lower-class men benefit from psychological interventions in courses. pic.twitter.com/K05r7cTrJY
Learning about Learning at Scale: Methodological Challenges and Recommendations
- Of course, a machine learning will be more predictive if you add as much features as possible
- Selection bias
- Homogeneity bias
- Lindley’s paradox: big data increases the probability to find a p-value < 0.01
=> Importance of preregistered studies.