Yoenis Cespedes and the statistically inevitable injury

Now that we have a 10-day disabled list, some teams have been very proactive about using it. A player needs at least three days to recover? Take 10! After all, it’s early in the season. Better for players to take it early now instead of trying to play through an injury and making it worse. When Trea Turner pulled his hamstring, the Nationals took one day to see if it cleared up, then immediately put him on the DL. When Yoenis Cespedes walked to the on deck circle last Sunday, and even when he started playing last week, I was screaming “no, no, NO.”

It was around this time a year ago that Cespedes aggravated his thigh bruise, missed three games, pinch hit, had an off day, missed another game, then got his first start a week later. He didn’t go on the DL then, but he missed so much time that he should have. The Mets have a history of asking players to wait and see, then tough out injuries instead of proactively placing them on the disabled list.

It’s easy for writers and fans to look at injuries retrospectively. I can look back at Cespedes’ injury last week and start furiously typing away that of course the Mets should have just put him on the DL as soon as he tweaked his hamstring! When Cespedes hurt himself, the Mets probably thought he’d recover faster than he did. Players want to prove their durability and don’t want to let teammates down.

The disabled list has always forced teams to guess about injury severity and recovery times. Since you can’t bring players back early if they recover faster, teams may lean towards rest for a few days instead. Baseball’s switch to a 10 day disabled list was modeled after the success with a special 7-day DL for concussions, which have notoriously unpredictable recovery times. A 10-day disabled list for any injury could reduce the penalties for guessing wrong and encourage teams to be more proactive about putting players on the DL instead of playing with 24 healthy bodies.

Smart franchises had already developed a few ways to get around some of the drawbacks of the 15-day DL. Teams can hold pitchers back for a start and use a spot starter instead of putting that pitcher on the DL. Triple-A affiliates can be used to shuttle fresh relief arms to the big league club, as long as the affiliate is relatively close. (This is one of the many drawbacks of the Mets having their Triple-A affiliate on the other side of the country.) However, there weren’t any real workarounds for situations like Cespedes’ thigh injury in 2016. The 10-day disabled list gives more flexibility, which the Mets decided not to use. Is Cespedes aggravating his hamstring injury bad luck, or a predictable risk the Mets should have avoided?

Background: How Long Are DL Stays?

One reason why baseball may have taken so long to address these situations is because most players who go on the disabled list aren’t ready to hop back on the field after 15 days. I combined Baseball Prospectus’ transaction tracker with Retrosheet’s play-by-play data from 2010-2016. My database has 1,088 hitters who went on the DL during the season and came back during that season. Only 20.22 percent of them returned within 15 days. This includes players who were on the 7-day concussion DL and were medically cleared before 15 days. The “15-day” disabled list was more likely to be 31-45 days off than 15 days and back to baseball.

Days on DL	Frequency	Percent

7	12	1.1
8-14	30	2.76
15	178	16.36
16-20	202	18.57
21-30	235	21.6
31-45	225	20.68
46-60	105	9.65
61-90	77	7.07
91+	24	2.21

If teams follow the same philosophies for when to put hitters on the disabled list, then changing from a 15-day DL to a 10-day DL isn’t going to have much of an impact for them. However, the 10-day DL reduces the cost of putting someone on the DL. Repetitive stress injuries like Cespedes’ quad and Asdrubal Cabrera’s strained left patella tendon from last year could be treated differently. In the future, should they immediately be sent to the new disabled list?

Before the season started, I ran some statistical models predicting the likelihood of each position player getting injured at some point in the season, along with the range of games they would likely miss if injured. Statistical models are only going to do so much in predicting injuries. Acute injuries like getting hit in the hand by a 95 mile-per-hour fastball are relatively random events. There’s no way I can get an “Ojeda hedge clippers” variable. Teams may have better information about the health of their players, but they will keep this proprietary. As much as this data may help my statistical model here, it’s better to protect players’ privacy. That’s why I am relying on trips to the disabled list as my main measure of injuries.

Now that the season has started, I’m going to go through every player from the last seven years on a game-by-game basis. I want to specifically look at situations like Cespedes’ injuries, and Cabrera’s injuries, and Neil Walker playing until he collapsed last year. I used every position player appearance from 2010-16, giving me 291,777 observations of batters with at least one plate appearance in a particular game. I will explain the basic findings by using some Mets as examples, then give all the technical details.

Short Term Rest and Hope or DL Now?

Let’s start with Cespedes in 2016, since we know how his season played out. After being inactive for nearly a full week, he returned to the lineup in late April and made 31 consecutive starts. In August, the Mets used a similar rest but no DL strategy with Neil Walker. He got four games off on a West Coast trip, started two games in San Francisco, then took five more days off. Walker only played two more games the rest of the season. Resting him and playing with a short bench only delayed the inevitable.

Getting a low number of plate appearances per game over the last 10 days was one of the two biggest risk factors in my model. Players who rested as much as Walker did and then came back to play were 35 percent more likely to go on the DL than players who got the average amount of playing time. It’s easy to misinterpret this: I’m not saying giving players more days off causes new injuries, so let’s play them until they drop. (The Mets played Cespedes and Asdrubal Cabrera every day in June and July last year, until they dropped.) As I wrote in the offseason, players who spend most of the season on the bench have less opportunity to get hurt.

The risk factor is when players suddenly need a couple days off. Injuries may take more than a couple days to heal to the point where giving maximum effort will not aggravate the injury and make it worse. In Walker’s case, it’s possible that going on the DL immediately could have prevented season-ending surgery. It’s also possible that the injury was already severe enough that surgery was inevitable. We’d need a time machine to know for sure. What statistics can tell us is that starters who get held out 3-4 days, let alone a week, are still more susceptible to injury when they come back. Just sending those players to the disabled list to let them focus fully on recovery may be a good idea.

What About Track Records?

We know that players have different injury histories. If David Wright starts feeling neck or back pain, we’d treat it differently than Walker having sudden and unexpected lower body pain. To my surprise, there wasn’t a clear, straight-line relationship between a batter’s track record for durability and their likelihood of going on the DL after a particular game. After a lot of trial and error (which I describe in more detail in the Gory Math section), I found it makes more sense to put players in three different groups:

Everyday players: 60 or more games in the 80 days before a hypothetical 10-day DL window. Remember that off days count in this “80 days” measure. For the Mets, think of players getting as much playing time as Cespedes (before his recent injuries) or Curtis Granderson.
Recovering players: 9 or fewer games in this 80 day window, To screen out minor league callups, players need to have been in the big leagues for at least 90 days. Think of Wright or Lucas Duda coming off his back injury at the end of last year.
Everyone else: These are hitters who appear in 16 to 79 percent of the team’s games. For the Mets, think of Wilmer Flores, catchers, and Duda most of his career. (The 2016 Mets had surprisingly few players who fit in to this category.)

Everyday Players

A sudden decline in playing time is an even bigger red flag for regulars like Cespedes and Walker. Playing every day – or just about every day – doesn’t necessarily wear hitters down. It’s a little like Newton’s first Law of Motion. Hitters who are in the lineup at least 80 percent of the time over a three-month period are the most likely to stay in the lineup unless an injury knocks them out. These are players who rarely take games off, and they wouldn’t take multiple games off unless they were seriously limited. A regular who slows down to Walker’s 1.5 plate appearances per game stretch is twice as likely to go on the DL when they step back on the field. These durable players are trying to play through serious injuries that they may not be able to overcome.

The good news is everyday players can go back to being durable everyday players even after spending time on the DL. Prior injuries are always a risk factor for future injury, but the risk is much smaller once a hitter shows they can get back to playing every day. Think of someone like Carlos Beltran, who only played 145 games in 2009-10 combined but came back to play at least 142 games and make the All Star team the next three seasons. I’m not saying someone will play at the same level after an injury, or that everyone will have a full recovery. What I’m saying is that if Duda showed he could stay on the field for the first half of the season, I’d be more likely to trust in his health for the rest of the season.

Recovering Players

When James Loney ran out of pixie dust, the Mets didn’t rush Lucas Duda back from injury and took their time with his setbacks in rehab. Even once he returned, the Mets didn’t rush him to the everyday lineup, despite the protests of us here at BP Mets. My model suggests that when to bring a player back and how much to play him are two separate issues. Players who have had multiple trips to the DL and are coming off a major injury are the biggest risks for (re)injury. On the other hand, there isn’t a consistent pattern about how much playing time to give these players once they return. Some may suffer from being out of game condition and be at greater risk of reinjury, while others could come back like Kyle Schwarber.

For David Wright, coming off spinal stenosis and a herniated disk in his neck, there will always be cause for concern. Each additional trip to the DL multiplies the likelihood of subsequent injury by 1.1462. If a player has only been on the DL once, that’s just an additional 14.62 percent…not that bad. Since we are multiplying the risk, each successive injury has an even bigger effect than the injury before it. David Wright has been on the DL four times since 2010, which means his risk of going back on the DL after any game is 72.64 percent higher than a player who has never been on the DL. These risks will be even greater when a player makes their first couple of games back.

Everybody Else

I used players who play in 16 to 79 percent of a team’s games as the baseline for comparison. These players’ injury risk is kind of what we’d expect it to be. Like the very durable hitters, a sudden lack of playing time may be a red flag of an injury. Prior injuries are also a risk factor for future injuries, particularly if the player has suffered multiple injuries.

Using the 10-day DL

The 2016 Mets had several positions where they would use the same players every single day as long as they healthy. Some of this shows the toughness and durability of players like Cespedes, Walker, Cabrera, and Curtis Granderson. However, it also seems to be an organizational philosophy. Terry Collins put Loney in the lineup 37 straight games once he arrived in Queens. Last year’s Mets played position players until they dropped, but most of them dropped at some point and had to go on the DL. In theory, giving players more days off and using the bench more could be a way to minimize these injuries. However, it doesn’t seem to fit the Mets’ philosophy, particularly for the middle infield.

Ironically, I was checking the computer code in the background when Cespedes had to be helped off the field on Thursday. In my initial draft, I wrote “Switching to a 10-day DL seems tailored for a team like the Mets that wants to play its best players as much as possible.” The logic is straight-forward. Give players ten days of rest when they are hurting, particularly early in the season, to maximize their ability to contribute the rest of the season. After all, players don’t grind through the MLB schedule to play 140 or more games unless they can play through some pain. If even these durable players can’t play through some pain, teams should take it seriously. The drawbacks of having a recovered player on the DL have greatly diminished, so take advantage of the new rule.

We all know people who are resistant to change. Most of us have probably worked in organizations that are resistant to change. It can take a massive shock to get people or organizations to behave differently. Over the years, the Mets haven’t exactly been proactive in using the disabled list. Just giving Terry Collins, the front office and the medical staff fewer drawbacks to using the DL hasn’t been enough to get them to take advantage. Maybe Cespedes getting hurt will be the last straw to shock the Mets out of being behind the curve about the importance of rest in sports.

Gory Math:

I used a logistic (logit) regression model for whether a player would go on the disabled list after the game or not. Logit models are ideally suited for yes/no outcomes like this. Before I go in to some of the specific variable choices I made and why I made them, it’s important to go over two issues of using this type of regression model:

1) Injuries are rare on a game-to-game basis. The Mets’ sudden rash of injuries is so shocking because it’s so rare to see multiple injuries in a week of baseball. Logistic regressions are multiplicative, not additive. This means for an independent variable like the number of injuries, every injury multiples the likelihood of an injury instead of adding a set number. This is fine for comparing one player’s injury risk to another player’s risk, but it obscures the constant suggesting a very low baseline rate of injury.

2) With a sample of nearly 300,000 observations, it is almost laughably easy to get a p-value of less than 0.05. Just to prove this point, I tried a regression model with each value for longer-term games played as a separate dummy variable. Players who played 13 games in this period have a statistically significant risk of increased injury (p = 0.013), but there is no logical reason for them to be so much less durable than someone who played 12 or 14 games in this period.

Variable Choices:

Dependent variable: Did this player go on the DL after the game. I tried a separate regression model looking specifically at players going on the DL but coming back that season (as opposed to season-ending injuries) and the results were largely consistent either way. There are more complicated statistical models that could incorporate did someone go on the DL and how many days did they miss.

Short term playing time: Plate appearances per team game over the last 10 days. I tested 4, 7, 10, and 15 days as a definition of short-term activity. Ultimately the 4- and 7-day periods had too much random noise. 15 days of activity wasn’t an improvement over the 10-day period, so I used 10 days to mirror teams’ new decision of “should we have just put the player on the DL for these 10 days?” Plate appearances are the best indicator of a position player’s workload. Because off days have more influence over a short time period, and different players will have a different number of days off, I turned this in to a measure of PA per team game instead of total PA in this 10-day period.

Longer term playing time: Games played in the last 11-90 days. There was a lot of trial and error here, so bear with me as I unpack it all:

 I intentionally used games played instead of plate appearances in both measures. Plate appearances over a long period of time has a very unusual distribution. Instead of being a curve there is a plateau in the middle as players who played every day and then suffered major injuries (think Wright or Giancarlo Stanton) collide with players who wouldn’t play every day even if healthy (think Wilmer Flores). Games played does a better job of sorting out these differences and has less correlation with other independent variables.
 11-90 days was more trial and error, for 90 days vs. more day. In this model, I reinvent
nt the calendar so there is no off-season and the end of one calendar year wraps directly into the beginning of the next season. It’s a measure of a player’s track record for how often they play. It starts at 11 days to avoid any overlap with the short-term measure.
 I originally expected games played to have a linear relationship with injury risk. Good thing I double-checked that assumption! I initially tried more than two categories, but it turns out that the groupings in the middle are relatively similar. I use these players as the omitted category.

Number of injuries vs. time on the DL: I only have data for major league injuries, and only since 2010. I tried both the raw number of DL drips and days on the DL. I fully expected days on the DL to be a better fit for the model, but it turned out that number of injuries fit slightly better (and its much easier to write about). The main problem with days on the DL is season-ending injuries. Some players would be able to play in October or November, while others can’t play at the start of next season.

Interaction Terms

My statistical model relies fairly heavily on interaction terms, a statistical technique to test whether the combined effect of two different variables could be different than the sum of the parts. For example, we’d expect a player who has suffered multiple injuries to be at greater risk for another injury. We’d also expect a player who is coming back after missing at least three months of games to be at greater risk. An interaction term lets us see whether a player who checks both boxes, like David Wright, is even more vulnerable. It’s easy to get lost, so I will give a few more examples after providing the model.

	Coefficient	Std. Error	P>\|z\|
Recovering Players	-0.705	0.242	0.004
Everyday Players	0.376	0.200	0.060
PA/game (10 days)	-0.158	0.028	0.000
PA/game * Recovering	0.128	0.107	0.233
PA/game * Everyday	-0.125	0.056	0.026
Prior Injuries	0.137	0.022	0.000
Injuries * Recovering	0.190	0.070	0.006
Injuries * Everyday	-0.091	0.043	0.034
Constant	-5.131	0.081	0.000

N = 291,777 after excluding pitchers and anyone who hadn’t been in the majors for 90 days (so they would not fit in to a recovery, everyday or other player bin).

Walking through examples used earlier

1: Walker got 1.5 PA per game the Mets played, while the median player in my database got 3.4. If we want to compare their probability of getting injured, we take e to the power of the regression coefficient instead of adding or subtracting. The median player’s likelihood of being injured after a game, compared to Walker, is one multiplied by e ^ (-0.158*1.9) = 0.74, controlling for other variables. Alternatively, Walker’s injury risk relative to the median would be 1 / 0.74 = 1.351, or roughly 35 percent higher than average.

2: Here’s where interaction terms get complex. If we want to compare two hitters in the average number of games played long term bin, we just need to look at the regression coefficients for PA per game in the short term and prior injuries. If we want to compare two everyday players, we need to do a lot more work. First off, there’s a baseline coefficient of 0.376. If we controlled for all other variables, it looks like everyday players get worn down. But that’s not a good interpretation of how interaction terms work. The 0.376 coefficient compares two players who had the last 10 days off and have never been on the DL; the only difference between them is how much they played over the last three months. It’s a statistical fiction that doesn’t occur in real life with real baseball players.

Let’s see how the numbers work with real world conditions. Walker got 1.5 plate appearances per team game during his August time off. Let’s say a regular who continues to play regularly gets an average of 4 plate appearances per team game. If we want to compare Walker to this regular, we need to put the everyday player baseline, the playing time variable, and the interaction between them in to our equation. The injury likelihood for a regular who keeps playing, compared to a part time player taking 10 days off, is e ^ (0.376-0.158*4-0.125*4) = 0.487. If that regular only got 1.5 plate appearances per team game, like Walker did, their injury chance is e ^ (0.376-0.158*1.5-0.125*1.5) = 0.952. The ratio here is 2.03.

Photo credit: Noah K. Murray – USA Today Sports

Yoenis Cespedes and the statistically inevitable injury

Background: How Long Are DL Stays?

Short Term Rest and Hope or DL Now?

What About Track Records?

Gory Math:

Related Articles

1 comment on “Yoenis Cespedes and the statistically inevitable injury”

Background: How Long Are DL Stays?

Short Term Rest and Hope or DL Now?

What About Track Records?

Gory Math:

Share

Related Articles

1 comment on “Yoenis Cespedes and the statistically inevitable injury”