By the Numbers: An Analysis of the Reviews Deleted in the Goodreads Policy Change

Update: After posting this, I received another list from a user who was in the initial 21. There are 11 titles on this list. Five of them are by authors already in this data. I haven’t had time to rejigger this analysis, but her list doesn’t materially affect the results. I have updated the database with her information.

 

On September 20th, Goodreads Customer Care Director Kara posted in the Goodreads Feedback group a new change in their policy. She reiterates their policy of not allowing threats or harassment and  mentions some changes to the Goodreads Author dashboards. The item that gets everyone up in arms is this one:

**[Goodreads will] Delete content focused on author behavior. We have had a policy of removing reviews that were created primarily to talk about author behavior from the community book page. Once removed, these reviews would remain on the member’s profile. Starting today, we will now delete these entirely from the site. We will also delete shelves and lists of books on Goodreads that are focused on author behavior. If you have questions about why a review was removed, send an email to support@goodreads.com. (And to answer the obvious question: of course, it’s appropriate to talk about an author within the context of a review as it relates to the book. If it’s an autobiography, then clearly you might end up talking about their lives. And often it’s relevant to understand an author’s background and how it influenced the story or the setting.)

Immediately responses start flooding in, decrying this shift and asking for clarification as to what constitutes “author behavior”. Kara clarifies in an edit:

The reviews that have been deleted – and that we don’t think have a place on Goodreads – are reviews like “the author is an a**hole and you shouldn’t read this book because of that”. In other words, they are reviews of the author’s behavior and not relevant to the book. We believe books should stand on their own merit, and it seems to us that’s the best thing for readers.

Several Goodreaders note that they received emails from Goodreads with lists of book reviews and shelf names that had been summarily deleted by Goodreads. A screencap of one such email can be found here, and there is a transcription available here. In another edit to the initial post, Kara adds:

Thank you for all the comments so far. One concern that has come up in this thread is that the content was deleted without those members first being told that our moderation policy had been revised.

In retrospect, we absolutely should have given users notice that our policies were changing before taking action on the items that were flagged. To the 21 members who were impacted: we’d like to sincerely apologize for jumping the gun on this. It was a mistake on our part, and it should not have happened.

When several users question what the deleted shelves “taa” and “icy-hex” even mean, and how that might have anything to do with author behavior, Kara responds:

We don’t comment publicly on individual cases, but in general, what we do is look at a shelf and see how it is used in context. In any case where we have decided to remove that shelf, we are confident that the shelf was being used in a way to review author behavior.

Previously, Goodreads had just hidden reviews that focused on author behavior. A hidden review is accessible to friends, but is not listed on the main book page. Goodreads did not just delete all hidden reviews, instead they divined the intent behind the shelf names and reviews of 21 people, and then deleted their reviews. Goodreads can’t publicly comment on the reviews they deleted, as I can see how that could be untoward, but the people affected can talk about the content of their reviews. These 21 people also received emails detailing the deletions, so we can know exactly what books are being flagged. I wanted to get those lists and collate the data: is there a pattern to the deletions? Are the same books and authors coming up again and again? And if I could find the 21 people who had their reviews and shelves deleted, I could ask them exactly what the content of their reviews was, and how exactly they were using their shelves.

So now I had to go about finding the 21 people who had their reviews deleted before Goodreads began sending take-down notices before deletion. 21 users isn’t a lot of people, especially on a site of 20 million. (Although throwing around the 20 million users number is a little disingenuous, because the reality is that most of the activity on any given social medium is going to be concentrated into a much smaller number of people.) I already had two of the affected users in my friends list, and due to posts in the feedback thread and old-fashioned grape-vining, I was able to identify 6 more. At this point, I put out a status update on Goodreads, which read:

In the interests of science, I am trying to collect the lists of books deleted by Goodreads in the recent “policy change”. So far, I’ve tracked down 8 of the 21. Can you please alert me to: 1) who got emails from Goodreads? 2) a list of their books deleted and 3) shelf names.

It didn’t take me too long to realize I needed to get this update out to other social media platforms, as at least one of the people who had reviews deleted had deleted his Goodreads account, and others were staying out of Goodreads until they could download their information and then delete their accounts. I posted on Tumblr, Twitter, and Booklikes. Through a flurry of activity across several media platforms and including email, I managed to find 4 more users.

I was forwarded lists from these 12 people. 377 reviews were deleted in total, with the number of reviews deleted per user ranging from 1 to 129.  All of their emails from Goodreads have the same wording, and the time stamps are within a short period. This would become important, a there were a second round of emails sent out to users, this time with a warning. I have excluded those lists from the data, as so far all of them have specified shelf titles only, not specific book reviews. (I have heard of one user who got a take-down notice listing specific reviews, but I have yet to hear back from her.) So, now I had lists of titles from 12 people, which seems a reasonable sample of the 21 users Kara mentions. It’s also entirely possible Kara is not completely accurate about the number of people targeted with deletion. For one, she keeps saying they can’t access deleted data, and for another, 12 is a transposition of 21. Given how small, in some ways, the very active Goodreader population is, I’m suspicious that this 12 is all the users who were subject to review deletion.

Unfortunately, these lists were only of book titles, and did not include  the author who wrote the book. In order for this database to be meaningful in any way, I was going to have to correlate books with authors. For example, let’s say that three different titles by the same author have reviews deleted off of three different users’ shelves. Without knowing the author, it doesn’t come out in the data that reviews of his or her book are being flagged in multiple places. Some of the titles are unique, so that eliminates guesswork. Some aren’t, but I could make informed guesses by observing which were Goodreads authors who had books published in the last couple of years, or had reviews still standing that talked about the author. I assigned as many authors as I could, and then submitted the lists back to the users for correction.

In cases of a multi-author book or an anthology, I listed the author indicated by the user as the reason the book was shelved as “do-not-read”. In cases where a writer works under several pen names, I listed their real name. (Or maybe more clearly, I listed the name that the writer uses publicly, even if it is a pseudonym too. My aim was to have all books written by the same person show up together, not determine what name is on the driver’s licence. That’s never important information.)

So this is my first large disclaimer: The list of titles comes directly from the Goodreads emails, but the list of authors assigned to those books is constructed data. In some cases, the user simply couldn’t recall which of the dozens of books entitled Inhale or Truth she had decided not to read. And the first disclaimer brings me to my second disclaimer: this list of authors should not be taken as hit list. Despite Goodreads’s surety that they were only deleting reviews based on author behavior, this was not the case for many of the titles listed. Before I get into specifics, though, I should probably talk about what these reviews looked like. 

There are a lot of things we can’t know for a fact, because obviously the reviews are gone, but I asked all the users if the reviews in question had ratings, or if the review field had any content. Almost all of the reviews in question had no ratings. All of these users adhere to a personal policy of not rating books they haven’t read, with the exception being books that they have read parts of. The only books that had ratings had been at least partially read. Here I would like to note that Goodreads does not have a policy against rating books that you have not read, as that would be both unenforceable and impossible to prove.

I have seen users bemoan that these reviews are somehow skewing the ratings for books, but I would like to point out two things. First, we are dealing with a few hundred reviews against the tens (and possibly hundreds) of millions of reviews on Goodreads. There is no way their removal is going to have a statistical effect. Second, there are thousands of users doing things like “rating on excitement” for unreleased books. Take something like Black Ice, Becca Fitzpatrick’s book which has a publication date more than a year from now. As far as I’m aware, there are no advanced reader copies, and likely the only people who have read this are Fitzpatrick’s friends and family, if even a completed manuscript exists. Black Ice has an average rating of 4.23, which is completely unheard of. 67 users have given it a 5-star rating , versus four who have given it a one. If you want to talk about skewed ratings – and I would like to note right now that ALL ratings are subjective by their very nature and therefore meaningless as some kind of objective metric – then you should start with the overwhelmingly positive ones.

For example, a comment from Goodreads CEO, Otis Chandler, in a Goodreads Feedback thread about pre-ratings:

Interesting thread! I agree that it’s a shame some books have to suffer ratings that clearly are invalid. However I can’t think of a way to prevent it, and I didn’t see any ideas in the thread either (I did skim though). I hope you’ll appreciate that if we just start deleting ratings whenever we feel like it, that we’ve gone down a censorship road that doesn’t take us to a good place.

As for manuscripts or yet-to-be-published books, I have no problem with them being in the database. It’s kind of cool to have a record of in-progress books, and I don’t think it hurts anything. I do think we’d need to remove any that weren’t serious in their intent to be a finished book one day.

When there was content, the review content was generally terse, from quick dismissals to “not for me” to “see comments” to a link or screencap to whatever the controversy was surrounding the book. Many of these controversies, indeed, had to do with the broadly defined issue of author behavior. These controversies range from books being pulled from publication for plagiarism, racist or homophobic statements made by the author, the author’s conviction on the charges of owning child pornography, downvoting campaigns instigated by authors or agents , the doxing of reviewers by authors, down to just a bunch of dumb stuff authors occasionally say out loud. I have already written at length about how these  “author behaviors” are not equal, but just to reiterate: noting a book has been pulled for plagiarism, for example, is about the book’s unoriginal content, not about  the author’s behavior as a word thief. Noting a children’s book author is convicted of child pornography is the kind of author behavior that has a direct import on the content. Many, many people are currently boycotting Orson Scott Card for his political views, and deciding not to read the books by authors because of their beliefs is a political act Goodreads has no business getting in the middle of. The rest I’m going to shelve for the moment, and get onto the next point.

Additionally, some of the books were shelved “do-not-read” not because of the actions of the author, but because the book looked bad to the user. These are a vanishingly small number though. The other large minority of reviews deleted were shelved because the book was pulled-to-publish fan fiction e.g. Fifty Shades of Grey. A pulled-to-publish fan fiction is one where a freely available fan fiction is pulled, the content lightly edited – often a search & replace with the names “Bella” and “Edward” substituted for other names, not to be too snarky here – and then the book put up for sale. P2p books, as these are referred to, are a controversial topic, but I can’t really call the path to publication and the source of the plot lines “author behavior”, except in a way that nullifies most of literary criticism. (Also of note: no reviews of Fifty Shades were deleted, though I’m sure I could find you many that note its p2p status and not much else.) Whether you regard p2p novels as ethical or not, the information that a book is p2p is not about the author at all.

As far as the content of the review, most indicated that they had nothing in the review field for most of their reviews. Often the comments about the author behavior were occurring solely in the comment threads, as there was literally nothing – not a rating, nothing in the review field – about author behavior at all. From personal correspondence with rameau:

I kept the specifics in the comment field from the moment GR first announced they weren't allowing any non-book related information about authorial behavour in reviews.

Or from Miranda, whose reviews constitute 129 of the reviews deleted, a sizable minority:

“None of those books had an actual text review or a rating. Only shelved by me, but all had screenshots or links in the comments.”

If there was no content – no rating, no statement to the effect of “The author is such a dick. I’m not even going to read it!” – then what Goodreads has done here is delete forums on which Goodreaders have discussed their personal boycotts of selected authors, discussions which are going on all over the site right at this moment , and have likely increased exponentially since the vaguely worded new policy about author behavior. Though Goodreads is claiming this is about review content – such as the hypothetical review example from Kara “this book is by an a**hole and you shouldn’t read the book because of that” – many of these reviews literally had no content, and Goodreads has taken action against review threads I am appalled by this, and you should be too. More than anything else about this debacle, this is the thing I would like you to come away with: Goodreads has deemed the comment threads of a user’s review space actionable to the point of deleting the entire comment thread. 

The Database:

But let’s move away from the self-reported data into the actual data. A searchable database can be found here, and there are screencaps I’ll get up at some point to ensure that if there’s some kind of vandalizing of the data, a record of it in its original form is extant. (I don’t even mean to sound paranoid, but after the copious googling it took to compile these authors – not all, not even most, but a virulent few – I am actually feeling worried that someone might try to vandalize the data.)

So, some very basic numbers:

Number of delete lists: 12
How many reviews deleted, in total: 377
Average number of books deleted, per user: 31.4

The number of reviews deleted, by user:

Archer         6
Bitchie         3
Carla          76
Jane             9
JennyJen 72
Kara            17
Linda            1
Mirage       10
Miranda 129
rameau       5
Ridley        36
Steph         13

As yu can see, the number of reviews deleted by user varies wildly. Three users, Carla, JennyJen, and Miranda, had 277 reviews deleted between them, which constitutes almost three quarters of the number of deleted reviews. This looks incredibly personal.

Here is a graphic of the number of reviews deleted by user:

 

Archer 6 Bitchie 3 Carla 76 Jane 9 JennyJen 72 Kara 17 Linda 1 Mirage 10 Miranda 129 rameau 5 Ridley 36 Steph 13
Reviewers and reviews deleted.

(And a quick note on user names: several of these users asked that I keep their Goodreads screen name out of this. I have assigned pseudonyms to three of them, and shortened one screen name in the interests of brevity.)

A Statistical Sampling of Authors

Overall, the 377 reviews on this list were written by 174 authors, which gives us an average of about 2 books for each author deleted. The actual deleted number range from 1 to 14. It’s fairly easy to sort through the lists and find the author who has the most books deleted, but this isn’t statistically important information. Usually that is an indicator that  the author has written a lot of books, and/or the author was shelved heavily by one user only. The more important data is this: what authors’ books are showing up on multiple delete lists.

 

Authors and reviews deleted

Again, I want to reiterate: this list of authors is not a hit list. It is simply the authors whose books turned up on multiple delete lists, for whatever reason. In doing my research, I had to unearth the controversies that surrounded each of these writers, and I felt some of the situations were silly or overblown, while plenty of them had merit. In other words, I used my own judgement about the information. To quote rameau again:

BBA [badly behaving author] note doesn't stop me from reading a book (see Jamie McGuire and Orson Scott Card), it's supposed to stop me from spending money without serious consideration.

The following list notes the name first, and then the number of users’ delete lists their books were on:

Cassandra Duffy, 5
Melissa Douthit, 5
Jaq D. Hawkins, 4
Kiera Cass, 4
L.B. Schulman, 4
Layce Gardner, 4
Rebecca Hamilton, 4
Carroll Bryant, 3
Donna White Glaser, 3
Emily Giffin, 3
Heather M. White , 3
Jordin Williams, 3
Lauren Pippa, 3
Marla Madison, 3
Ruthi Kight, 3
Shannon Mayer, 3
Amy Plum, 2
Ava Michaels, 2
Betty Jay, 2
Hugh Howey, 2
Jessica Park, 2
John Simpson, 2
Judyann McCole, 2
Julie Halpern, 2
K.P. Bath, 2
Kendall Grey, 2
Kenya Wright, 2
L. Kirstein, 2
Leigh Fallon, 2
M.R. Mathias, 2
Rick Carufel, 2
Robin Wyatt Dunn, 2
Sharon Desruisseaux, 2
Steph Campbell, 2
Sue Dent, 2
Trisha Telep, 2
William Terry Rutherford, 2

 

These 37 authors out of the 174 total are important because they showed up on multiple delete lists. Rather than go through all of the authors and try to find the controversy behind their do-not-read status, I have used this group as a statistically important sampling. Of the 377 reviews deleted, 240 were for reviews of books by this 37 authors. 64% of the reviews deleted are covered by this list of 37 authors. All of the graphs going forward deal with these authors only. If anyone wants to do a more complete sample, the database is freely available.

First off, here is a graphic of how many authors on multiple delete lists were indie, with small presses, or with Big Six publishing houses. Sometimes the exact affiliations are hard to parse, and decisions had to be made about whether Big Six distribution was the same as Big Six publishing, etc. You are welcome to parse this chart yourself. Either way, the chart shows the general trends. We’re dealing with largely self-published books here.

 

19% small press, 16% Big Six, 65% indie
Authors and reviews deleted

Although the reviewer/authors conflicts have been sometimes been characterized as occurring in the Young Adult readership more than others, when you look up the genre of the books affected, that doesn’t turn out to be true. It is a large minority, but plenty of other genres are represented. This is not a boutique issue. Some books are in multiple categories or genres, which is why these categories add up to more that 37.

1 non-fiction, 11 erotica/romance, 3 mystery, 1 historical fiction, 9 SF/F, 13 YA
Deleted Books by Author

Next up we have the nature of the controversy that landed the author in question on multiple users’ do-not-read lists. Admittedly, this involves some guesswork, but generally the controversies were easily googleable, and I relied on the reportage of the people involved. I’ve broken the kind of controversy into categories, based on my own sense of how they are different. The categories are:

  • Political: racist, sexist, & homophobic statements made by author, in addition to one instance of the author being convicted of owning child porn.
  • Marketing: use of sockpuppets for rating inflation, spamming bloggers, spamming in general.
  • Reviewer conflicts: personal attacks against readers/reviewers, downvoting campaigns instigated by either authors or proxies, impolitic statements.
  • p2p fiction or plagiarism: either the author has written pulled-to-publish fan fiction, or there are allegations of plagiarism either in the book, or in sockpuppeted reviews of the book.

7% political, 7% marketing, 10% p2p/plagiarism, 75% reviewer conflicts (Several authors showed up in multiple categories, just as a clarification.)

The elephant in the room here is affiliation with the website Stop the Goodreads Bullies. I urge you strongly not to give these people traffic, as they are doxers, cherrypickers, and generally people you don’t want to get involved with. The only good thing I can see coming out of this mess is showing the average Goodreader just how unhinged these people are. They lie, they insinuate, and they post out-of-context screencaps of conversations occurring on Goodreads (some on my own thread, and you can read the entire context here yourself. I apologize in advance for how much cussing I do, in general.)

 

27% yes, 73% no.
STGRB Affiliated

A sizable minority of the reviews deleted were authored by STGRB affiliated authors, and I’m struggling to understand why Goodreads is going after reviews of books by authors they have banned from their site, people like Melissa Douthit and Carroll Bryant. By the numbers, these are largely self-published authors. I don’t even mean to sound snarky, but who even cares about these writers in the larger literary context? Maybe it’s ridiculous to give these writers platform by shelving their books do-not-read and linking to their myriad social media meltdowns, but it is so much more ridiculous to delete the discussion of these events. Goodreads is a social media platform, and this seemingly personal, yet also arbitrary, deletion of conversations should give the average Goodreader pause.

Whether you think these conflicts have any merit, whether you think doxing is legitimate, whether you think sockpuppets are are a valid marketing strategy, it makes no sense to me that users cannot be allowed to exchange this information about the professional, personal, political, criminal, and sometimes, just sometimes, the literary merit of living authors. It is not just a marketplace of ideas, but an actual marketplace, and often the only power we have as consumers, as citizens, is in where we spend our hard earned dollars. Where we spend our hard earned dollars on a leisure activity. The only vote we have sometimes is the one with our dollars, and Goodreads coming in and stifling discussion of who users believe merit their time and cash is, and I’m sorry for the cussing, bullshit.

While I was writing this post, Goodreads “announced” on their Feedback thread that they were going to try to reinstate the reviews lost in the deletions, and some clarification of their policy. Frankly, I haven’t had time to read this, and I’ll leave its consideration for a later date. The reviewers who were subject to deletion also received the following email:

Hi [Goodreader],

We are contacting you to let you know we are working on retrieving the content that was deleted from your account on September 20. We’re very sorry about how that was handled. In retrospect, we should have notified you and provided you with a copy of your content when we deleted the reviews/shelves.

We also mistakenly deleted your shelf called “due-to-author”. We know we were not clear in our previous response about this. A “due-to-author” shelf fits within our guidelines and is allowed on the site.
We’ve discussed this in more detail with our engineers, and while the reviews have been completely deleted from the database, it turns out we can retrieve the content through back-up servers. We will email it to you for your personal records as soon as the import completes in a week or two. Feel free to re-import your “due-to-author” shelf, but please note that the content that violated our guidelines cannot be re-posted on Goodreads.

Sincerely,
The Goodreads Team

So, sorry we deleted your reviews, but they are still illegal according to a policy we absolutely refuse to clarify. If you look at the data, reviews are being arbitrarily and personally deleted, according to no standard I could discern. I leave it to you, fellow Goodreaders, to make sense of these numbers.

 

A quick note of thanks:

I have been using the word “I” though this essay, but that is inaccurate. This database would not have come to be without the help of dozens of people. Thanks to:

The 12 people who forwarded me their delete lists, anyone who passed notes, sent me links, and otherwise made this social media social; for technical help, a shout out to DMS who built the spreadsheet, and sj for making graphs, and Ziv for number crunching; general thanks to Steph and Wendy Darling for link-farming and karen for reader’s advisory, plus just dozens and dozens of people who found me and told their stories. As sickened as I am by this action by Goodreads, I am cheered by the overwhelming power of concerned people acting together. Single tear, guys.