This August we got a bug report from a customer: They had an issue with the published state of their migrated nodes. Their complaint was that some of their nodes were migrated with the latest revision being active for anyone, while the correct visible revision should have been a previous node version.

We checked their installed Drupal 7 modules, and our suspicion was confirmed: the source site was using Workbench Moderation configured to provide moderation workflows for some of their content types. And at the time we didn’t have any solutions for correctly migrating moderated content to Drupal 9.

I guess you won’t be too surprised if I say that Acquia decided to fund the development of moderated content types and their moderation states. This ended in Workbench Moderation Migrate, and I am telling you about how the moderation worked in Drupal 7 and how it works in Drupal 9, and what was the greatest challenge I had to solve in the migration paths.

Moderated content

The main feature of content moderation consists of the following things:

  • It provides moderation states a content can be in (Draft, In Review, Published, Archived etc.), and
  • Transitions between the states; and rules around when a transition can be used on a moderated content, and what state will be applied on the content if the transition is performed.

This ecosystem allows you to have a published content version that is live and also have a separate working copy that is undergoes reviewing before it gets published, replacing the previous live content version.

Moderated content in Drupal 7

Drupal 7 doesn’t have any support for “forward” (or “pending”): it always checks the revision with the highest revision ID. And users always see (or don’t see) this most recent node version.

Drupal 7 Workbench Moderation works around this behavior. Whenever a user creates or edits a non-public draft of an already published node, the module saves a new, published revision which is the clone of the most recent published version. and since the last saved published revision has a higher revision ID than the “pending” version, users will not notice any change. (But in fact they will see a new revision.)

Workbench Moderation tracks the state transitions in its history database table, which contains an incremental history ID, an entity and an entity revision ID, the previous and the new state of the version, and the time when the transition happened.

An example how this works in Drupal 7:

# Action performed Rev. ID From To Public Current
1 New node, sent to review 1   in_review
2 Reviewer publishes 2 in_review published
3 User creates a new draft 3 published draft
4 WBM clones rev. #2 4 published published
5 Draft sent to review 5 draft in_review
6 WBM clones rev #4 6 published published
7 Pending draft published 7 in_review published
8 New draft 8 published draft
9 WBM clones rev #7 9 published published

Published node revision is #9, which is a clone of #7. But editors are editing #5 (this is what “current” means).

Content Moderation module in Drupal 9

Workflows and Content Moderation added to Drupal 8.2.x have clearer foundations: moderated workflows can track and maintain not only the moderation states of content revisions, but they can manage a flag on the revisions. This flag contains the info whether the version is a default revision or not. And Drupal’s Entity API is also aware of this. If an entity specific (and not entity revision specific!) read operation happens, Entity API checks the most recent revision flagged as being default. And this version’s state will determine whether the user can access the content, and if it is a public state, this will be the version they will see.

Here is the equivalent history data of the steps performed from the previous point on Drupal 9:

# Action performed Rev. ID From To Public Default
1 New node, sent to review 1   in_review
2 Reviewer publishes 2 in_review published
3 User creates a new draft 3 published draft
4 Draft sent to review 4 draft in_review
5 Pending draft published 5 in_review published
5 New pending draft 6 published in_review

Please note how this new “default” flag works: as long as there is no published version, all versions are “default”. It looks weird at first, but it totally makes sense: if a visitor tries to access the content early on, when only the draft revision #1 exists, Drupal should return a 403 response.

How can we move this from Drupal 7 to Drupal 9?

You may already have a clue what the solution is: we have to identify which node versions are the clones of a previous published revision, and skip their migration. And yes, this is its essence, beside other nits – like being able to identify the first published revision. Unfortunately, it wasn’t this easy to solve.

But I had the first rule I set up: Clones of a published revision shouldn’t be migrated.

The biggest challenges

Drupal 7 Node + Workbench Moderation allows users to change the state of a revision and even to delete revisions on the revision UI1. It isn’t a big problem if a published version clone was deleted, but in every other case, we will have “holes” in the revision history, and this needs to be bridged somehow.

States changed on revision UI

Why is it a problem that the state was changed on the revision UI? Because then Drupal does not create a new revision for the pending draft, so we will have more than one history record for the same revision. This is how it looks (watch out what happens between the sixth and eighth lines):

# Action performed Rev. ID From To
1 New node, sent to review 1   in_review
2 Reviewer approves and publishes 2 in_review published
3 Published version was edited 3 published published
4 New draft 4 published draft
5 WBM clones rev #3 5 published published
6 New draft sent to review 6 draft in_review
7 WBM clones rev #5 7 published published
8 Reviewer sets back to draft on rev. UI 6 in_review draft
9 WBM clones rev #7 8 published published
10 Draft sent back ro review 9 draft in_review
11 WBM clones rev #9 10 published published
12 New draft reviewed and published 11 in_review published
13 First published rev. is restored 12 published published

This led me to set up the second rule that I applied when I was checking the “from” or the “to” states of a node revision: from state should be fetched from the first history entry, to state should be fetched from the last history entry of a node revision.

Removed revisions

I hope that it is obvious why it is a problem if revisions are deleted. Assume that only revision #4, #6 and #12 are available in the source. These are the only revisions which will be migrated by d7_node_complete. Just take a look at this history where I removed those history entries whose revision was deleted:

# Action performed Rev. ID From To
4 New draft 4 published draft
6 New draft sent to review 6 draft in_review
8 Reviewer sets back to draft on rev. UI 6 in_review draft
13 First published rev. is restored 12 published published

How can we migrate these to the destination site by retaining the highest level of data integrity?

We can see the connection between the two non-published revision2. But how can we connect the last one’s to state with the last, published revision’s state? We cannot assume that there is a direct transition from draft to published!

In such cases, the solution is a bit sad: If there is no connection between the first drafts and the first published version, then we ask d7_node_complete to skip migrating the stale drafts. This was the third rule.

Unpublished nodes which were published before

I want to show another tricky example3:

# Action performed Rev. ID From To
1 New draft sent to review 1   in_review
2 Reviewer publishes 2 in_review published
3 New draft 3 published draft
4 WBM clones rev #2 4 published published
5 Unpublished on revision UI 4 published draft

We can migrate rev #1 and #2, but what should we do with #3 and #4? I obviously cannot drop #4 – although it was a clone initially, it became a real transition because this was the published revision which archived the whole content. And on the other hand, if I didn’t migrate it, then the node would be published in Drupal 9, because revision #2 was public, and #3 is only a new, pending revision.

Well, I checked what happens on Drupal 9 if I repeat the same steps there, and there was a big relief: every single moderated content can be archived, even if it has a pending version.

So this transition log migrated into Drupal 9 looks like this :

# Action performed Rev. ID From To Public Default
1 New draft sent to review 1   in_review
2 Reviewer publishes 2 in_review published
3 New draft 3 published draft
4 Content unpublished 4 published archive

Drupal core issues discovered:

  • #3200949-9: Non-default entity revisions are migrated as default revision because EntityContentComplete does not allow creating forward (and non-default) revisions
  • #3052115-32: Mark an entity as ‘syncing’ during a migration ‘update’ and possibly test syncing semantics (no changed item bump, no content moderation revisions)
  • #2329253-78: Allow the ChangedItem to skip updating when synchronizing (f.e. when migrating)

Footnotes:

  1. In these cases, the moderation history table will still contain the history of the node, so we won’t have the revision data available. 

  2. Yes, there are three non-public table rows in the table, but two of them were recorded for the same node revision ID

  3. This represents the typical history entries of the nodes we got the incident notice from the customer about.