top of page

Ep 23. Red flags in education research with Ben Solomon

 

This transcript was created with speech-to-text software.  It was reviewed before posting but may contain errors. Credit to Jazmin Boisclair.

​

You can listen to the episode here: Chalk & Talk Podcast.

 

Ep 23. Red flags in education research with Ben Solomon

 

[00:00:00] Anna Stokke: Welcome to Chalk and Talk, a podcast about education and math. I'm Anna Stokke, a math professor and your host. You are listening to episode 23 of Chalk and Talk. I'll be taking a short break from the podcast, so this is a season finale. Follow me on X or LinkedIn for updates on new episodes. I'm finishing off the first season with an extremely important episode. My guest is Dr. Ben Solomon from the University of Albany.

 

He knows a lot about research methodology. Something that keeps coming up, and I've heard this from teachers, parents, and reporters, is that it seems like there's research in education to support almost any claim.

 

And for someone who's not a researcher, it can be difficult to determine when a reference for an educational claim is research-based and when it's not. I planned this episode with the goal of giving teachers, parents, and the public tools for evaluating claims in education. If you teach pre-service teachers, I hope you can use this episode as a resource for discussing research methodology.

 

If you're a teacher, I hope it will help you to sort out what's evidence-based and what's not. Perhaps it can even be used in your next professional development day. And if you're a parent, I hope it helps you to advocate for evidence-based math instruction for your child. I asked Ben to discuss two specific education papers in this episode.

 

One is on standard algorithms, and we discuss flaws with the methodology in that paper. The other is an example of a rigorous educational study. Along the way, we have a passionate discussion about the importance of teaching standard algorithms. Ben shares five red flags to be on the lookout for in education papers.

​

Those are posted on the resource page for this episode. We discuss the science of learning and the need for education to evolve to use evidence-based practices and data to inform decisions so as to achieve best outcomes for students. I had a lot of fun working with Ben to put this episode together, and we hope you find it helpful.

 

Now, without further ado, let's get started.

 

I am really excited to introduce Dr. Ben Solomon today, and he is joining me from Albany in New York. He has a Ph.D. in school psychology. He is an associate professor in the Department of Educational and Counseling Psychology at the University of Albany. He is the director of New York's Technical Assistance Partnership for Academics there.

 

He works closely with the New York State Education Department to promote best practices in academic assessment, instruction, and intervention for students with disabilities. And he does research in math assessment and intervention. He is well-versed in research methodology. He teaches a research methods class, and I am hoping he can teach us a few things today.

 

And I do feel the need to mention that you are a dual citizen. Your dad grew up in the wonderful city of Montreal, so a big shout out to your dad and your family here in Canada. Welcome, Ben. Welcome to my podcast.

 

[00:03:36] Ben Solomon: Thank you so much. My dad will be elated to hear that.

 

[00:03:39] Anna Stokke: So let's start with talking a bit about your work with the New York State Education Department. Can you tell us what you do in your director role with that?

 

[00:03:50] Ben Solomon: I direct a team of associates here at the university, and they have extensive experience in educational policy, instruction, measurement, teaming, systems capacity, and we work with New York State's education department, specifically their office of special education, and we work with 12 regional teams situated across the state to support students with disabilities by promoting best practice in literacy and math instruction. 

 

We do a lot of coaching. We do a lot of consultation. And we also do a lot of material development developing professional development. Slide shows and handouts and webinars that these regional centers can then go use with schools and other agencies to promote high-quality evidence-based practice in the state.

 

[00:04:42] Anna Stokke: I want to talk about research today and education research, and I want to start by talking about algorithms, and you'll see where we're going here. So, first off, standard algorithms. So for people listening, if you're not sure, the standard algorithms are the traditional algorithms for arithmetic.

 

So, for example, in the standard algorithm for addition, you would line up the digits vertically according to place value. You would add from right to left, with a carry if necessary, and the standard algorithm for subtraction, same idea. We line up vertically and borrow if necessary.

 

And then the one for division is usually called long division. So there is an idea in education that standard algorithms are harmful, and it's sometimes claimed that they work against conceptual understanding. And you hear some extreme views. Some say that they shouldn't even be taught. Others say that they should be delayed, that other methods should be taught first.

 

And then sometimes it's insisted that if, they must be taught, they should be taught as one of several strategies that children can choose from. So the history here in my province, at one point, standard algorithms were removed from the curriculum. I think something similar happened at one point in the United States.

 

So this is around 2006. Students were instead supposed to invent their own strategies or use other strategies. So you might use, say, an area model to multiply or partial products or some version of the distributive law but not the standard algorithm. Now, as the size of the numbers increases, these methods become very inefficient and difficult to work with.

 

And the situation now in my province is that students can learn the standard algorithm, it is part of the curriculum now, but often, they actually have to demonstrate several strategies for each operation. So this is essentially the situation across most of Canada.

 

And people have also told me that the same situation is in New Zealand, for instance, and some other countries. This is for each operation. Multiple strategies or personal strategies or invent your own strategies. So I'm wondering, first off, what is the situation in the United States with standard algorithms and multiple strategies?

 

Are standard algorithms taught? Are teachers also required to teach multiple strategies? What's the situation there?

 

[00:07:10] Ben Solomon: This is my, my more informal perspective, but it's decidedly a mixed bag right now in the States. The way our system works is that there's a lot of decision-making that can occur within an individual district or within an individual school. Locally, you could have ten different perspectives represented.

 

I will say that, you know, based on my perspectives in the field, there has been a strong challenge to the standard algorithm over, I'd say, the last 20 to 30 years. I think some states that have made the news in this regard, for example, are California. However, the standard algorithm hasn't been outlawed, if you will.

 

It is still taught in a variety of schools. across the nation, but not in any kind of standardized, consistent kind of way. Locally, in New York, it's still all over the map. For example, at my daughter's school, they're now shifting back and forth between deciding whether to teach the standard algorithm or teach multiple strategies.

 

I work with a school a little bit south of here where the decision actually is made at the classroom level. So individual teachers are trying different curriculums and there's no real consistency even within grade. Our state standards in New York suggest that children should learn multiple strategies.

 

I'd say curriculums are also over the map to you know, within the states, as I'm sure it's the case in Canada, you have major vendors that produce these curriculums and I think people are shocked by the wild spectrum of content that's represented in these curriculums.

 

Some of the curriculums are very linear in teaching the standard algorithm and maybe then extending to sort of additional multiple strategies that can be chosen, but starting at that foundational level. Others emphasize multiple strategies right from the beginning and then others, you know, as you sort of alluded to push children to invent their own strategies.

 

And so because you have this diversification, both at the curriculum level and then the local level and then the state level, it's a difficult pattern to ascertain. There definitely has been some headwinds, as I said before, to move away from the standard algorithm, which I find unfortunate. But it is a motley situation in terms of trying to determine consistency across the states.

 

[00:09:30] Anna Stokke: And we'll come back to this later. We'll have a discussion about what you think about teaching the standard algorithm, but I'd like to examine where this claim that standard algorithms are harmful actually came from. And I'm quite familiar with this because, as I said, this happened here in my province and the justification given was a paper or a series of papers written by Constance Kamii.

 

And surprisingly, her work was also cited in the recently approved California Math Framework to justify the statement that students who use invented strategies before learning standard algorithms understand base ten concepts more fully, etc.

 

I would like to examine that paper with you. So the paper is called “To Teach or Not to Teach Algorithms.” The authors are Constance Kamii and Ann Dominick. It's in the Journal of Mathematical Behavior, 1997. I'm going to put a link to the paper on the resource page. And I'm going to quickly try to summarize what's said in the abstract. They looked at second, third, and fourth graders among 12 classes. The students were individually interviewed to investigate the effects of teaching algorithms like carrying.

 

Some children had not been taught any algorithms previously. We'll call those the “no algorithms group.” Others have been taught standard algorithms. We'll call those the “algorithms group.” They were asked to solve multi-digit addition and multiplication problems and explain how they got their answers. So for example, they were given something like 7 plus 52 plus 286 that was written horizontally, and the children were asked to give the answer to that without using pencil and paper.

 

They say that they found that the no algorithms group produced significantly more correct answers and that incorrect answers from the non-algorithm group were more reasonable than from the algorithm group, and they make the conclusion that algorithms unteach place value and hinder children's development of number sense. 

 

Later in the introduction, we see even stronger statements like algorithms are harmful to children's development of mathematical thinking. And we're given a list of educators who urge that we stop teaching algorithms. So I'm wondering if you can take us through this paper briefly and identify perhaps some of the issues with this study.

 

[00:12:12] Ben Solomon: I don't know if you've ever seen the movie Catch Me If You Can. There's this scene towards the end where Leonardo DiCaprio's character is given a fake check and the FBI is trying to determine whether he, how quickly he can figure it out because they want to see how useful he might be to them. And he picks up the check and within a half second, he can tell it's fake.

 

And I thought of that scene when you shared this paper with me, because within a half second, you can determine that the validity of the claims are very questionable and that we shouldn't be making deep inferences from this paper. And minimally, we shouldn't be using it to guide policy for millions of students.

 

So the reason I say you can tell very quickly is because if you skim the paper, there are some features of it that I think quickly show that the inferences we can take from the paper, we might want to at least reflect on them. One is that the paper is very short. It's only nine pages. The description of the methods is a couple paragraphs. 

 

If you look at contemporary studies these days, they're really hard to write. And good journals these days are demanding a lot of information in terms of how you collect data, your sample, how you implemented an intervention, how you did your analysis, how you drew your conclusions. There's no way it could ever be summarized in that small of a space. 

 

So right away when I saw the paper was that short, it leads me to wonder maybe what wasn't said. I think some other issues you can quickly ascertain from the paper, is as you described, it's not an experimental study, it's a correlational study. They didn't control how students learned that was all just sort of occurred naturally, and then they tested them on the back end and made conclusions about what might have happened on the front end.

 

And so that introduces the classic is, you know, correlation is not causation, you know, argument that we learned very early usually in undergraduate psychology  that two things might be connected, but that doesn't necessarily mean one caused the other. The other thing I noticed immediately was the size of the sample. It was 56 students. Studies range in their sample size, and a lot goes into thinking about what a good sample size should be. It depends on what you're trying to do, it depends on sort of the rarity of the problem you're addressing. 

 

Sometimes there's reasons to have small sample sizes, but for especially for a for a study like this that influenced major policy and was correlational, that is non-experimental, I would expect the sample size to be much larger than it was. Crossing multiple schools, possibly crossing multiple states so that we can truly ensure that this was a representative sample.

 

So hose are things I noticed right off the cuff with this study. And I want to be clear, I'm not trying to knock the authors. You know, there's no place for ad hominem attacks in science. You know, we can assume that, in their perspective, this was an authentic study. And if it was a small piece of a sequence of studies, generally increasing in the rigour, then it might have a place in the science of learning, but that's not how it was used.

 

Right. It was used in isolation to influence policy and change the narrative on education, and that's wholly inappropriate just given those three small things I just mentioned. Then we can dig deeper into the study there's other things that, you know, I think would come out of the woodwork. If you look at contemporary papers right now they'll talk exactly about sort of the demographics of the students, they'll often be measured on a number of pre-test measures, and those are described in terms of how the students look and their equivalence across groups. 

 

The instructional strategies that were being tested were also thinly described, and there wasn't really a lot of effort to really ensure that the standard algorithm group and the invented algorithm group, there was no real effort in the study to ensure that those two groups are fairly homogenous, that they were consistent, and that was the teachers true intent to teach one or the other. It was actually unclear sort of what these students learning history was and that's really important to know if you're going to do a study like this. 

 

It was sort of assumed based on it what looked like brief interviews with teachers. The outcome measures themselves I found a bit questionable. As you pointed out, they interviewed students and gave them I think it was a single problem and then made generalizations from that to what their trajectory in math was across the educational span.

 

And you contrast that to contemporary studies, and they're usually putting in these very rigorous, field tested, comprehensive measures of achievement to ensure that what they're implementing, what independent variable they're introducing truly has meaningful effects on student outcomes.

 

They implement measures that have been found to be highly predictive of students' later success because that's what matters. In this situation, interviewing a student on an individual problem, again, might be interesting from a hypothesis generation perspective. I'm not saying that doesn't have a place, but to make strong conclusions based on that very narrow slice of behaviour I find problematic.

 

And for me, that would substantially change how I communicate my findings. I would have limitations all over the place. Even the interview itself, it didn't appear to be standardized in any kind of way. Usually in these situations, if you do see interviews conducted and that somehow is informing the conclusions of a study, there is a script. Because you want your interviewers to do that all in very similar ways so there's much less space for bias.

 

Oftentimes they'll be recorded and then other observers will watch that recording to ensure they were done in the same way. I didn't see that here. Again, it might have been the shortest methods section I've ever read. So how these interviews were going, whether there was potentially leading questions, maybe that the authors were or weren't aware of. 

 

Finally, another thing I noticed was that they were comparing students in terms of relative performance but not absolute performance. So the students who supposedly weren't taught the standard algorithm got answers that were closer to the true answer than the students that could do the standard algorithm or have been trained in the standard algorithm. We don't know if anyone in the study actually was highly proficient in math. Did any student actually achieve a competent benchmark? It was a relative comparison. 

 

And on that note, you shared with me a website from Quirk where he discusses this study, he also talked about that set of circumstances with the interview. Where it really set up a situation where the students who weren't taught the algorithm would be favoured because it better represents their educational experience, but they didn't try the reverse to see what would happen.

 

So, overall, if you look at all these sorts of limitations of this study compared to what we tend to see in very high-quality journals, it just raises an enormous amount of questions.

 

[00:19:48] Anna Stokke: That's a great analysis. the point you made about this paper being used to influence policy for millions of kids, that is quite alarming. Okay, so I'm going to say, you know, in Canada that when that curriculum I was talking about was adopted and I'm not kidding you that for a while, teachers were told that children could not use standard algorithms in class. They were not to teach them and kids were not to use them, even if they've been taught at home.

 

And this all seems to be based on this paper, which seems to have deep flaws. I want to just mention that you talked about Bill Quirk, and he was a mathematician. And so he had a Ph.D. in math and he wrote a critical review on this paper, as you mentioned. And his main point was that the no algorithms group was actually prepped for the test, right?

 

Whereas the students who learned algorithms, they didn't have that kind of experience answering those kinds of questions. So of course it came out in favor of the no algorithms group. There are deep flaws with this, a deeply flawed 1997 paper, and still used today as evidence for children instead using invented strategies.

 

[00:21:09] Ben Solomon: What you describe there is something we often see, which is sort of the extensive generalization of the findings. These authors conducted this very small correlational study that has its place, but the generalization of the inferences coming off that have to be equally as constrained as the study itself.

 

So what you're saying is findings from this one study broadly influence policy. Think about that relative to other fields that rely on the scientific framework. Think about medicine. Think about if you were researching cancer treatments because you were looking for the, best therapy for you or a close family member and the doctor said, “Well, we have this therapy, you know, it's been introduced recently. It was tested on 20 patients, and it wasn't even a real study, it was a correlational study. They did it, but they didn't have a control group, and they interviewed them, and they said they felt better, and we didn't look at their long-term outcomes.”

 

You know, you would be floored, right? You would never agree to that cancer therapy for you or a close family member, right? But that's what we do in education sometimes. Even though the outcomes are so vital, students' math instruction really determines for many students what their outcome is going to be in, you know, in terms of their preparation for STEM fields. Literacy is important too, but students get a lot of exposure to literacy often outside the classroom, especially for privileged students.

 

Books are all over the place. One little jingle we tend to say in the literacy world is 50 percent of students can learn, regardless of curriculum. The other 50 percent need high-quality teaching. I don't know what the exact ratio is in math, but it's most certainly the other way where virtually all students need high-quality instruction because they don't get math incidentally outside the classroom. 

 

So the consequences of good or poor instruction in the classroom and math is incredible. And so to think we layer that, all those outcomes, on this tiny little study, as you pointed out, that was done, you know, nearly 30 years ago. That's unfortunate. That's not the way the science of learning and the science of education should progress. Just like it wouldn't in any other field.

 

[00:23:29] Anna Stokke: Yeah, I agree. And what happens when we are not doing a good job of, teaching students mathematics is the parents who can afford to do it will self-insure, so they will find ways to make sure that their children get the math instruction they need, so they'll either give it to the children themselves if they're able to, or they'll sign them up for tutoring and various outside programs, and so this creates deep inequities in education.

 

And so it's actually quite unethical, in my opinion, to be using papers that are quite flawed, that are not based on rigorous studies to inform education policy. What is your opinion on standard algorithms and multiple strategies? Do you think standard algorithms unteach place value?

 

[00:24:18] Ben Solomon: I think standard algorithms are very important. One thing we discussed amongst my peers is that the standard algorithm didn't just pop out of nowhere. The standard algorithm was very hard-earned knowledge that took centuries to develop. So maybe, reflecting that cumulative experience, we should be using them because the reason they're called standard is because they've really been deemed the most efficient means to solve a problem, regardless of what operation you're discussing.

 

And again, that knowledge actually evolved over centuries. And so to dismiss that outright maybe isn't the wisest thing to do. My sort of belief is that we always should be teaching students the most efficient way to solve a problem first. That helps them develop the confidence that they can calculate that answer, and then make sure they can also do it fluently, almost effortlessly.

 

Once you reach that point, then if you want to teach additional strategies to demonstrate sort of, you know, how the phenomena of that operation can work in different perspectives, that's fine, go ahead. But always use the standard algorithm as your foundation because it has been shown to be the most efficient means to solve a problem.

 

[00:25:36] Anna Stokke: And as you pointed out, we're standing on the shoulders of giants as we always say, they are the most efficient algorithms. They work for numbers of any size. We can do complicated arithmetic without much thinking. And I would say they reinforce place value.

 

That's why we line up the digits in the way we do, right? We line up our ones, we line up our tens, et cetera. This idea that they unteach place value, I can't even believe that someone came up with this theory.

 

[00:26:02] Ben Solomon: So one thing that might be happening there, and I'm just hypothesizing, is that there might be a conflation between the standard algorithm and poor instruction. So if you're teaching the standard algorithm, but you're teaching it in a very poor way. That might then look to an outsider like the standard algorithm isn't working.

 

The students aren't learning it anyway. Or they're making mistakes in how they do it. Or they dislike math. Or they find it very boring. That isn't a result of the standard algorithm. That's a result of poor instruction. And so if you conflate those two things, again, correlation and causation, which is sort of a theme here, right, then it's going to look like the standard algorithm is causing all the problems that we've been observing nationally in terms of just, you know, woefully inadequate math achievement.

 

[00:26:56] Anna Stokke: And that's an argument we'll hear a lot, that people don't like the standard algorithms because kids make mistakes. And this is common when you teach math. You know that people will make mistakes and they'll always make the same kind of mistakes. How do you make sure they don't make those mistakes?

 

Well, A: you teach well, but then the other thing is you need a lot of practice. And I think a lot of this goes back to kids not getting enough practice with the standard algorithms. And also think about the fact that teachers are being asked to teach multiple strategies. So what's going to happen when you have to teach multiple strategies for every single arithmetic operation?

 

You're taking up a lot of time, a lot of class time to do this. And then kids don't get enough practice with any of them. And in particular, they don't get enough practice with the standard algorithm.

 

[00:27:52] Ben Solomon: So that's a common phenomenon across reading and math. In reading one of several problems we've been finding in terms of sort of gaps in instruction is fluency. Students may be taught basics, but then teachers move through those basics too quickly before students have demonstrated reading fluency. And what students need a lot of practice on, they need practice on many things, but one critical area is simply, you know, reading sometimes under timed conditions to build that reading fluency so they develop automaticity. 

 

What people have been slower, I think, to come to awareness of is that the same applies to mathematics. You can build mathematical fluency. We do it in a lot of the interventions that I'm researching, such as explicit timing, where we have students, you know, solve problems very quickly under time conditions so they can do it automatically, including applying standard algorithms. They need that practice and it needs to be structured in a specific kind of way.

 

It doesn't need to actually take that long. What we have found is that in certain situations, a couple minutes a day of a fluency practice is night and day. It makes all the difference in the world. It does need to be in there somewhere because you're right, usually there's a lot of teaching, but not a lot of practice. 

 

Because math is hierarchical, you know, unless you really build to that effortless stage, you're on the wrong track in terms of what your math trajectory is going to be.

​

[00:29:22] Anna Stokke: And I think one of the biggest problems we see in math education is people disparaging practice. If you want students to be good at math, if you want them not to get behind, they really do need a lot of practice. And it's really unfair to disparage practice in my view.

 

[00:29:40] Ben Solomon: Practice it can be structured in such a way that it doesn't take that much time. It doesn't consume the teacher's planning and preparation time. So there are great ways to infuse practice into that learning that really does benefit students. 

 

I can't think of any other real practice you can recommend in the classroom that's more time efficient than those quick minute-math kind of routines where students are getting good feedback and practicing with their new skills over and over and over again.

 

[00:30:08] Anna Stokke: And I just want to mention that there might be some people thinking that timing causes math anxiety and I've addressed that on several episodes. So I would suggest going back to episode 17, if you are worried about time tests causing math anxiety. All right. So we have already addressed that issue. And in fact, timing is a very important part of becoming fluent at mathematics. 

 

Now that we've discussed standard algorithms, one of my favorite topics, let's move on and let's talk about what a rigorous educational study would look like. And so we've got another one here. I've got one in front of me this is just an example we're not trying to push any particular method here or anything like that, but I've got an example in front of me that you sent to me of a rigorous educational study, but I actually want to ask you first, so a randomized controlled trial is generally considered the gold standard for an education research study. 

 

So do you mind just briefly explaining what is a randomized controlled trial?

​

 

[00:31:15] Ben Solomon: Sure. A randomized control trial, which I'll call an RCT so I don't trip over my own words, is a discipline-independent design. It's used across different disciplines and people are probably familiar with those words from other areas in which we ensure groups are equivalent at the beginning of a study.

 

And we do this by identifying our sample first and then randomly assigning them to the condition so we ensure their equivalent at the beginning of the study. And what that does is it eliminates things we call threats. When we talk about threats, specifically threats to validity of a study, we're saying there's things in the study that might be occurring aside from the intervention, or more generally, the independent variable that explain the results.

 

If we don't randomize at the beginning of a study and maybe one group is substantially different than another. This can happen often in education because, let's say you're going in to introduce a pilot intervention, and all the teachers get really excited, so they say, “Oh, can you take student A, can you take student B, can you take student C, they really need help,” and so all these students end up in your intervention group.

 

But you're trying to compare that intervention group to some other group, usually a business-as-usual group. And now one group is loaded, right, with the students that are all at risk. The other group isn't. When you apply then the intervention to that group, but not the other group, the results can look really wonky and funny.

 

Because in this situation, what we have is called a selection threat. One group at outset looks very different than another. One way, probably the most efficient way in all of science, that you can eliminate these threats is by randomly assigning participants at the outside of a study. That doesn't eliminate all the threats, there's other things that can interfere that explain results that isn't the intervention. 

 

So if you say you're doing an RCT, that's not the final step. There are other things you need to show, but that eliminates a lot of those threats. And when we see a randomized control trial, that gives me as a reader a lot more confidence, not perfect confidence, but a lot more confidence that the conclusions the authors draw are sound and valid.

 

Now, there are some studies that are pretty good that don't use randomized control trials. There's a field of research methodology called quasi-experimental design. Those designs are far more complicated to do right. They tend to require much larger sample sizes. They tend to be rarer. That's a whole other thing.

 

And to do a high-quality quasi-experimental study is very, very effortful and, again, very complicated in terms of setting up the study and controlling for threats. Usually, engaging in more complicated statistics as well to wrap all that up. So I don't want to go there, you know, in education, we have numerous examples of really high-quality randomized control trials, which again, just like they do in the medical field, And that leads to conclusions that are more believable and more amenable to being considered within like a policy realm.

 

[00:34:30] Anna Stokke: The study I have in front of me is called “Testing the Immediate and Long-Term Efficacy of a Tier 2 Kindergarten Mathematics Intervention.” It's by Ben Clarke et al. I won't list the other authors. It's in the Journal of Research on Educational Effectiveness, 2016. 

 

And the study examined the efficacy of a kindergarten math intervention program called Roots. It's focused on developing whole number understanding in the areas of counting and cardinality and operations and algebraic thinking for students at risk in mathematics. So we're using this as an example of a rigorous, good education study because we're trying to learn about this today. 

 

So, can you take me through some of the elements that make this a good study?

 

[00:35:19] Ben Solomon: Sure. And I also think it's important to mention I've actually never met Ben Clarke, you know, I respect his research, but I, it's a completely independent review. In reading this study, I think there's a couple sort of features that identify it as a high-quality study that again, in isolation, wouldn't be used to make policy changes, but as part of a sequence of studies that show a pattern that might be used to inform policy for thousands and millions of students.

 

And so a couple things. One, if you look in the introduction, they set up the problem really well. It's a well-defined problem. They look at the 2013 NAEP scores. They show that students are, severely underachieving. They briefly discuss, means that, that might be, resolved. And this all leads to reasonable research questions and then hypotheses. That setup in of itself, usually lens confidence because it shows you that the researcher’s sort of understand the problem and there's a logical flow to how they develop their intervention. 

 

Another thing to notice is it's a randomized control trial, which we just discussed, you know, that's again considered a gold standard and, you know, dollar for dollar is the best way to, you know, have a high-quality study. And you might also notice that it's funded by the Institute for Educational Sciences, or what's often known as IES.  

 

IES funds a lot of more sort of large scale studies nationally in the US. That doesn't guarantee that those studies are good, but they only fund studies that have made really good proposals, in which they plan to attend to a lot of features about their studies in terms of the reliability and validity and the sample and student dropout and the kind of statistics they'll use and where it will do the study.

 

If a study is funded by IES that means they've at least planned for that study to be very rigorous. So when I see IES-funded research, again, that doesn't guarantee it's a good study. That's a point, if you will. Another thing I appreciate about the study is it tests a realistic intervention sort of, you know, framework.

 

It was 20 minutes long. It was done daily. It was done in ratios of 5 to 1 or 2 to 1. And so immediately, I think, oh, this is something a teacher could do. It's not super intensive. Sometimes you see, for example, studies that test like these hour-long interventions that are done one-on-one, and it's wholly unrealistic in a school.

 

So another feature I appreciated was that it tested an intervention that you could see they had an eye towards implementation at scale, and that's really important. They identify a business-as-usual condition. And they have a large sample. I think they had 37 classrooms and 290 students in the study.

 

That's a pretty healthy number. That's on the larger side. As far as these things go, some get much bigger, but 290 is a, is a good sample. You know, we just talked about the Kamii study, they had you know, I believe it was 57 students. You know, that's on the smaller end. So the larger the sample, the more confidence we can have that the conclusions would replicate again. They painfully discuss their procedures, and I mean that in the best way. 

 

They talk about what the intervention is going to focus on. They talk about how they focus on whole number knowledge. They talk about conceptual and procedural knowledge. They talk about count counting and cardinality and operations. So they really flesh out what this intervention is about.

 

And they do that so we as a, as a reader or as a, as a reviewer can evaluate that study and really think how that content jives with other good studies that are sort of competitors to this one. And it's done in a way that it's, you know, that the intervention is scripted and that the study could be replicated.

 

And that's not easy to do. It's a lot of effort to write out a study in this way with this level of detail. They discuss what's called statistical power which is basically, it's related to sample size, what's their chance that they'll find an effect if there is an effect in the study, and they talk about dropout of students too. They talk about their measurement system and they talk about what are known as proximal and distal measures. 

 

Proximal being measures that are sort of really closely aligned with the study and should change drastically because it totally reflects the content of the study and they talk about distal measures, measures that are sort of unrelated to the study that they didn't create. You know, that should reflect sort of broader achievement and it can be harder to find effects on those distal measures. You know, they're sort of challenging themselves. They still put those measures in even though they may not favour them as much as the proximal measures.

 

And that's really important because those are the big ticket items. They discuss fidelity of implementation of the study. They made sure that they implemented it the same way every time. And they had a lot of checks and balances for that. There are statistics I'm not going to get into, but I'll just generally say they were very rigorous.

 

And they talk a lot about what analysis they were going to use. And then they try other analyses afterwards to show that, you know, that the first patterns were, were correct. So the fact they put all that effort into that was, you know, reassuring. They discuss how their results fit within a pattern of results found across studies, and last, and this is really important, they have a huge limitation section. 

 

You know, the limitation section of a paper is where the researcher talks about all the things that could have gone wrong or reasons maybe we should doubt their conclusions. And you may think, “Well, why is that good that they had such a long section?” Well, because they were honest. They really thought out, “Here's all the ways that what we just concluded could be limited. Here are some of the ways it could be wrong, even though it was still a very rigorous study.”

 

So when you put all that together, you know, for a researcher to consider all those features of dropout, of statistics, of design, it's not easy. It requires enormous effort and years-long planning. So when you see all that mentioned in that kind of way, that gives me a lot of reassurance that the conclusions from that study were good and sound.

 

And again, probably the most important feature of the study was that it was a randomized control trial. That's the bedrock, but there's so much else in there that boosts the confidence a reader can have in the conclusions the authors drew.

 

[00:41:43] Anna Stokke: And I'm going to post a link to that paper on the resource page and maybe it'll have your notes too. What I'm sort of hearing is this paper is quite scientific. They've been very careful in making sure that they're following maybe a scientific method.

 

So that brings me to my next question. Is education science? Like is learning a science, is there a science of learning?

 

[00:42:07] Ben Solomon: Yes. It's interesting that we're still having this debate because in, in very few other fields are people having this debate, right? No one says, well, is physics a science, Or is medicine a science? Those fields have evolved , well, well, well past that point where now the scientific method guides their entire epistemology, their entire framework for knowing.

 

In learning, unfortunately, we're still having that debate. We're still having a discussion as to whether learning should be construed as a science. My strong opinion is yes, it should. The scientific framework accelerated the growth of those other fields. Think of the discoveries that were made in biology and physics and astronomy and medicine once they adopted the scientific framework. 

 

Civilization as a whole, went off onto the highway. That's where we need to be with education. That framework worked for all other fields that adopted it. We should be using it in education to promote the best outcomes for our students. And so far, you know, when it has been applied, it's been shown to work. When we look at the data on schools that have adopted interventions and curricula that have gone through rigorous testing using that framework, those students have been shown to do much better. And there's no reason at this point that can't be elevated at a national level.

 

That doesn't mean that all we should be looking at is hard data. I really respect that in education, there are stories to tell, that there are perspectives that need to be shared that sometimes as scientists, we need what's called qualitative data to really get a good sense of how certain practices are affecting people.

 

That's important. I don't want to dismiss that. But at the core, using evidence-based practices and using data to inform our decision-making has been shown to result in the best outcomes for our students. So, yes, there is a science of learning. We just need to more broadly adopt it and, you know, use those findings at a much larger level to accelerate our students' learning.

 

It can happen. It has been shown to work at local levels. We can do this.

 

[00:44:35] Anna Stokke: But one issue is the following. We talked about the Kamii paper and then we talked about this Clarke paper and they're both published in journals, They're both, you know, sold as research. One is really rigorous research; the other isn't. How can a teacher or parent tell, though? To someone just looking at this and hearing that, “Well, there's this research that says that algorithms are harmful,” and they look and, you know, it's published in a journal and, the people who published it, one of them was an education researcher, so why should we think that there are problems with it?

 

This is a big issue. So here's the thing, if you're a teacher or a parent and you're given a research paper as evidence for a claim, what do you do? Like, what are some red flags that we can look for that might indicate that there are problems with the methodology?

 

[00:45:30] Ben Solomon: That's a really good question. And it's really hard; I've been in situations where I go into schools, and an administrator will put their hands up and say, “Well, peer-reviewed research can say all kinds of things, so I don't believe any of it. Anyone can say anything in peer review.”

 

And so they dismiss that entire framework outright. And it is true that lots of things are said in peer-reviewed research and different journals emerged out of different interest groups and sometimes they reflect good high-quality science, but other times those journals sort of elevate just out of sort of shared interest amongst the people within it.

 

So, unfortunately, just saying a paper is peer-reviewed is not enough. A peer-reviewed paper increases the chances that those papers' claims are true, but it doesn't, by no means, ensure it. Any person that is handed a paper as you just described, they need to apply what I call an analytical tool belt. They need to have some sort of rules of thumb, some perspectives sort of ready at hand so they can critically evaluate the paper. I thought of the best way to make a shortlist and I shared it with you. And I think there's five big ones that anyone can use to sort of critically evaluate a paper.

 

The first I would say is exaggerated claims and hyperbole. If you look in the introduction of a paper, what kind of claims are they making? Are they claiming they can change night into day?  Water into wine? Are they claiming they can change a student's literacy or math proficiency overnight with little effort?

 

If they sort of set up those kinds of propositions within the introduction, or they describe the study they did in those kinds of terms in the discussion, you know, immediately we should be highly skeptical. It takes a lot of work to teach a child and there are no magic bullets in education, that has been shown.

 

When you look at high-quality research like that Clarke et al. piece, they're skeptical of their own findings. You know, you can see that they're really giving it a good critical evaluation or the responding to peer reviewers that have given it a good critical evaluation. So if interventionists or authors are overly optimistic in their claims, that for me is one of the first red flags and you often do see it.

 

For example, I've seen it in the brain training research. Sometimes you see researchers who claim with brief video game like activities, they can permanently change an adult or child's working memory. That's a huge claim. If I see a claim like that, I'm going to be not closed minded. I'm not going to be cynical, but I'm going to get very skeptical.

 

I'm going to be looking for really high levels of evidence. So those big claims sometimes are the first, what we'll say, red flag. The second that I think anyone can look for. Are meaningful and measurable criteria used to index the effect of the study. So if you look in that Clarke et al. paper, they had five or six measures, I believe and they explained the reliability, they explained why they're relevant for students. They explained how they predict bigger outcomes. 

 

You know, they really spent some time defending their measures and showing how they're socially meaningful. You compare that to the Kamii paper, right, where they did a brief interview with students based on a single problem. And again, that may be the start of a good line of inquiry, but that in of itself is not sufficient. And I think anyone would be able to see that. 

 

You know, getting interviewed on a single problem is not predictive of how you might do math five years down the line. So we want to look for meaningful, relevant measures embedded into these studies to really objectively evaluate, you know, what that intervention or practice did for that student. Authors will often go at great length to discuss their measures and prove to you that they're meaningful.

 

Another red flag I would point out is what we might call unusual experimental conditions. Sometimes you see in these studies that they engaged in an intervention and maybe it did work, but the conditions under which they did it were sort of funny. Maybe it's again, as I said before, one-on-one in a lab setting. Or maybe it, they did it for one or two hours at a time.

 

You know, there's just not that much time to engage in intervention in a, typical school schedule. So if the conditions don't look like how your child is educated in the school, it doesn't look like something that could be implemented within a school environment, you know, that immediately leads me to be skeptical because they're almost stacking the deck.

 

Another red flag is the research design, which we just discussed. We want to look for really high-quality research designs and looking for, you know, a randomized control trial is a great way to efficiently root out a lot of the more questionable research. Again, not all great researches are randomized control trials, but a lot of it is. And when we see other words to describe a study's design: correlational, case study interviews, descriptive study, again, it doesn't shut us down, it doesn't immediately mean we stop reading and the study isn't good, but it does raise some red flags in terms of really, trusting the conclusions from that study. So we do want to look for high-quality designs. 

 

And the last red flag I would mention are questionable sort of citations referenced within a study. And I see this a lot. I do a lot of peer-reviewed work. I'm on the editorial board for three major journals. And so I am peer reviewing manuscripts left, right, and center. And oftentimes, when I'm peer-reviewing these manuscripts, I can see a study sort of heading off the rails, if you will.

 

They start citing a lot of conceptual papers. They start citing books. posters given at a conference, newspaper articles. And you can tell they're avoiding citing peer-reviewed research because there may not be a lot of it in the area that they're describing, or they know a lot of that peer-reviewed research says something completely different.

 

And so when I review a study, I want to see that study cite contemporary, high-quality, peer-reviewed work. That doesn't mean they can't cite, you know, a book chapter here and there, or someone's argument from a, you know, from a newspaper. But the preponderance of citations of research shown to set up the study should be high-quality, peer-reviewed work.

 

And oftentimes when you're in schools or you're having these debates with different people and, you know, whether it be online or in person I find in some of the weaker arguments, they immediately throw up someone's blog. And again, that blog may be good or bad. I don't know. But again, we always want to go back to the peer-reviewed research, and then even within that peer-reviewed research, we want to look for certain indicators of a high-quality study. 

 

So, you know, again, the five I think are good, efficient tools, you know, to keep in your toolbox when you're reviewing a study, even if you haven't done a thousand manuscript reviews: look for exaggerated claims, look for meaningful and measurable criteria, you know, to measure the outcomes of the study, look for unusual experimental conditions, you know, look to see whether that study test their interventions in a situation that makes sense, look to the research design and then look to the kinds of authors and other research that study sites to set up their own study.

 

[00:52:57] Anna Stokke: These are great and this is so helpful and I'm going to put these red flags up on the resource page and we can all carry those around in our pockets for the next time. We hear an exaggerated claim. One thing I wanted to mention, in post-secondary teaching there's also always sort of this debate about active learning and then traditional instruction. 

 

And my colleague mentioned to me that he'd looked at quite a few studies comparing these different methods of teaching, and often, in fact, almost always, the traditional instruction takes the following form: Instructor lectures at the front of the room, and there's absolutely no participation in the class. No questions, the instructor doesn't ask the class any questions, basically just lectures the entire time. But realistically, that is a bad way of teaching. 

 

Like, I consider myself to be on the traditional side when it comes to teaching, but there's a lot of engagement in my class, lots of participation and lots of questions. So I think one thing that happens in these studies is the method of teaching that people don't want to be the one that the study favours, they mischaracterize it so that it won't come out to be the best method of teaching in the study. Have you observed that?

 

[00:54:22] Ben Solomon: Oh, absolutely. So it's really hard to do what we'll call comparative research where you test one thing against another because inevitably the author probably has a bias towards one condition or the other, and there's always the opportunity to mischaracterize, you know, the other condition to set yourself up for success.

 

So we, we see that often. And it's really important for the authors to go to great pains to show that whatever they're comparing their intervention to that it is done correctly and it is done sort of in the spirit of how those authors, you know, developed it. That can be really hard. And again, that should be something that's extensively discussed within the study because you really bear the burden of proof there. 

 

And to give another example, we've been discussing the Kapur article, “Designing for Productive Failure.” And in that situation, they described direct instruction (DI) as sort of their comparative condition, their business as usual, but it's really unclear what that DI condition was, how it was set up and whether it was being implemented in a true DI spirit. And again, If you don't describe it very well, that opens the room for bias again, not necessarily intentional bias, but nevertheless, you know, everyone sort of implicitly is going to try to work to favour their products, you know, and their inventions.

 

So I see it a lot. And the authors have the burden of proof to show that they weren't biased and implementing sort of their counter condition.

 

[00:55:46] Anna Stokke: Yeah. And you mentioned the Kapur article and that is an article on productive failure. And we've sort of been discussing that in an email chain about some issues with that paper that we'll leave for another day or another place, because we had to do the Kamii paper on standard algorithms today.

 

Another thing I wanted to mention, I think people often assume that something's true when they're told it's true by someone who is considered an authority on the matter. So an example in reading would be Lucy Calkins. And certainly we have examples of math education celebrities that will be cited, their claims cited as evidence for sketchy things sometimes. But we should know now that This is not necessarily a good indication that something is evidence-based, and we should know this because of what happened with reading.

 

And again, if people haven't listened, I do recommend that they listen to my episode with Matt Burns. As an example, we heard about Matt Burns’ research on the Fountas and Pinnell Benchmark Assessment System that he basically found it was ineffective. Yet, Fountas and Pinnell haven't acknowledged that there are problems with the program, despite the research evidence.

 

So, how can this happen? Shouldn't researchers or educators actually change their mind to be in line with evidence?

 

[00:57:14] Ben Solomon: You know, so this is a problem that's sort of endemic to a lot of different places. It's not just education, right? When Matt did that wonderful research, looking at the Fountas and Pinnell system, that vendor had a financial interest to ignore that research.

 

So in that situation, what finally created some level of concern was the grassroots movement, when finally parents and teachers absorb that research and then elevated it from researcher to vendor. That rarely is a situation sometimes, but rarely it's a situation where you see a lot of change.

 

It's complicated, at the university level, in schools of education, you have researchers doing all kinds of things and they tend to be isolated. We tend not to have great collaborations amongst each other. And so people can get into these tracks where they sort of reinforce their own beliefs and develop studies that, favour their ideologies or philosophical opinions.

 

And then that all finds its way out of the woodwork into schools, which I think was one of the major problems of Lucy Calkins. You know, within Columbia, you know, Lucy Calkins developed this bubble. She was very profitable in what she was doing both for herself and the university.

 

Although there are fantastic methodologists at Columbia and there are actually, fantastic direct instruction, precision instruction people there, she operated in her bubble. And that, unfortunately, happens in universities and something, to be frank, we really need to work on because then, in the eyes of the consumer, it looks like every one of these researchers has an equal opinion or has equal credibility or equal weight when that's not the case.

 

From a more consumer side, there's an older researcher named Raven who developed what are called power bases and what these power bases are is basically the sources of power a given coach, or specifically, consultant has to change people's minds.

 

It's sort of a cute heuristic. I don't know how overall accurate the framework is, but they talk about the expert power. They talk about how when you, as a consumer, when you see someone who quote-unquote has “expert power”, you believe them because they have these credentials. Who are we to disagree with someone who identifies themselves as a learning scientist? But then at the top, as I just discussed, we have these people sort of entrenched in their own beliefs. 

 

So it can create a very confusing system. The other thing I think that is worth mentioning, sort of, is quite frankly, sunken cost. A social phenomenon where people who invest a lot of time or energy into a certain initiative or belief, you know, it's going to be harder to nudge them out of that because they've invested so much time into it And they don't want to look like a fool. They don't want to look like they've been going down the wrong path for 10, or 15 or 20 years whether it be a teacher in a classroom, whether it be a principal or whether it be a researcher at a university.

 

For example, with the citing science of reading movement, as it really evolves into sort of everyday conversation some researchers have been producing papers on the debated methods for 30 years to nudge them out of that's going to be very difficult. And it's naive to think that everyone's going to sort of evolve on their own. 

 

I don't know if I have a great solution to that, but it is definitely a concern. We have both serve at the school level and then at the sort of the academic level. And again, I think it's why it's really important that whether you be a teacher or whether you be a parent or a principal, that you be a skeptical consumer of research,

 

Be open-minded, listen to people's arguments, but ask for evidence. And when they produce evidence, evaluate it for yourself. Because a lot of these problems in terms of, you know, evolving and adjusting to the current research, it's not just a problem in schools and in practice. It's a problem at the university level too.

 

[01:00:56] Anna Stokke: And you mentioned sunken costs, and that's a really good point. I don't think the people that have invested years in these, and they're philosophies, really, a lot of these things are philosophies or ideologies. Even though you do see these things getting published in journals, they're still philosophies and ideologies. 

 

I don't think they're likely to take a step back. However, I do think that public funds shouldn't be spent on programs that are likely to be ineffective. And I think that something needs to happen there and I'm not sure what. I mean, Matt Burns mentioned something like an FDA for education. What do you think about that? 

 

[01:01:41] Ben Solomon: I completely see where he's coming from with that and it makes sense, right? You want some level of quality control at, the federal level. We have it in other areas and at this point we certainly know enough about instructional learning that we can make some confident recommendations and what should and shouldn't be part of that.

 

What I would say, however, is that, unfortunately, that would be very difficult to implement, at least in the states, just by the way our educational systems are structured, By design, there's a lot of power at the local level individual schools, individual districts, individual states get enormous decision making power on what occurs at the student level. 

 

And so if we were to set up some kind of FDA-like system at the federal level, and I'm sure I'm sure Matt understands this is that's probably more idealistic than practical, but the point absolutely still stands, to set something up at the federal level would be very difficult and there'd probably be a lot of kickback because, you know, at least in the states, there's such a belief that the local community should have such a heavy say and choice in what occurs in the school.

 

So with that said, I think the more efficient solution for the time being would be to inform parents and teachers to the best of our ability and have them be the people demanding high-quality practices for their students. We saw it in the science of reading; that was a grassroots movement where finally, journalists and teachers and educators got together and organized; they pulled the science together. 

 

So they were in a very defensible position. They, they brought in researchers, they didn't do it on their own. They brought in, researchers and professors and academics and they got together and they demanded better practice for their students. And that resulted in an enormous change that is still unfolding in exciting ways.

 

As we think about math now, which is unfortunately a few steps behind, I think that's the better route is to promote high-quality education, high-quality science and research literacy, and really try to, you know, bring this to the school boards, bring this to your local politicians, and say, this is what I want, this is what the research states, and this is what my child deserves.

 

[01:03:50] Anna Stokke: Yeah, absolutely. We have to do this for math. And reading is important, but math is also really important. And when children do not get a good math education, they're shut out of a lot of careers. And I would argue that also for our economy, we need to make sure that we have students that are able to perform well in mathematics. We need them for those quantitative careers,

 

[01:04:14] Ben Solomon: Absolutely. You know, I think there's this assumption that, technology is at such a state that it will just carry us through. We're having discussions at the university about how we integrate ChatGPT, and do we teach foundational skills now that ChatGPT can do anything the answer is yes. Yes, you still need to teach the foundational skills because you still need people to understand the underpinnings of what's going on. And there are everyday math skills that are still really important. So we need to, I think, double down on the foundations. At the elementary, secondary and post-secondary levels. you need high-quality math skills in everyday life regardless of the field you're in. You open yourself up to a naivete and a certain level of ignorance if you can't do basic computations. 

 

There's a huge risk in being illiterate that has been well documented. It closes all kinds of doors. And the same is absolutely true in math. And that's the case even if, you're not aspiring to the highest level of STEM education.

 

[01:05:10] Anna Stokke: So, is there anything else you want to add today?

 

[01:05:12] Ben Solomon: I think what I'd like to conclude with is to remind everyone it's interesting that science never proceeds linearly. You know, it's always in punctuated steps. It's not incremental improvement. We tend to have these fierce debates, they tend to come to a head, hopefully some level of truth is revealed, and then that field accelerates based on that discussion. And maybe sometimes it's a little static afterwards. And then it comes to a head again later on.

 

I'm hoping we're at a point right now where we're having one of those fierce debates. Where all of a sudden things are coming together based on this great research that has been done over 30 or 40 years. And it's finally really coming to a head and people are discussing it and we're talking about it on a podcast.

 

And I'm hoping out of this comes that acceleration where nationally it reaches a policy level and good practices come out of this and they're implemented with far greater distribution than they currently are. I think we can get there. You sort of see all the ingredients in place for people to really absorb this stuff and then carry it forward. 

 

So I'm very optimistic. I think these kinds of discussions are really valuable. There's a lot greater visibility with all the research. So let's accelerate out of this, to really improve outcomes for students.

 

[01:06:25] Anna Stokke: Well, thank you for ending on that positive note. And I want to thank you so much for coming on my podcast today. I really appreciate all the work you put into helping. prepare for this and coming up with the red flags document and I'm going to post all that and thank you so much for sharing your expertise with me and my listeners today.

 

[01:06:44] Ben Solomon: Oh, it's been an absolute pleasure. Thank you so much.

 

[01:06:47] Anna Stokke: More in just a moment. If you are interested in the science of learning and would like to learn more, consider attending researchED Canada, which will take place in Toronto, May 3rd through 4th of 2024. I will give a presentation there, as will many other guests I've had on the podcast. I will include a link to the website in the show notes. 

 

As always, we've included a resource page for this episode that has links to the two articles discussed in the episode, as well as the list of red flags we talked about. And I'll include a link to that in the show notes as well. Follow me on X for updates on new episode dates.

 

If you enjoy this podcast, please consider showing your support by leaving a five-star review on Spotify or Apple Podcasts. Chalk and Talk is produced by me, Anna Stokke, transcript and resource page by Jazmin Boisclair, social media images by Nicole Maylem Gutierrez.

 

Subscribe on your favourite podcast app to get new episodes delivered as they become available. You can follow me on X for notifications or check out my website, annastokke.com, for more information. This podcast received funding through a University of Winnipeg Knowledge Mobilization and Community Impact grant funded through the Anthony Swaity Knowledge Impact Fund.

bottom of page