Migrating moderated Drupal 7 content to Drupal 9? There is a module for that!
This August we got a bug report from a customer: They had an issue with the published state of their migrated nodes. Their complaint was that some of their nodes were migrated with the latest revision being active for anyone, while the correct visible revision should have been a previous node version.
We checked their installed Drupal 7 modules, and our suspicion was confirmed: the source site was using Workbench Moderation configured to provide moderation workflows for some of their content types. And at the time we didn’t have any solutions for correctly migrating moderated content to Drupal 9.
I guess you won’t be too surprised if I say that Acquia decided to fund the development of moderated content types and their moderation states. This ended in Workbench Moderation Migrate, and I am telling you about how the moderation worked in Drupal 7 and how it works in Drupal 9, and what was the greatest challenge I had to solve in the migration paths.
Moderated content
The main feature of content moderation consists of the following things:
- It provides moderation states a content can be in (Draft, In Review, Published, Archived etc.), and
- Transitions between the states; and rules around when a transition can be used on a moderated content, and what state will be applied on the content if the transition is performed.
This ecosystem allows you to have a published content version that is live and also have a separate working copy that is undergoes reviewing before it gets published, replacing the previous live content version.
Moderated content in Drupal 7
Drupal 7 doesn’t have any support for “forward” (or “pending”): it always checks the revision with the highest revision ID. And users always see (or don’t see) this most recent node version.
Drupal 7 Workbench Moderation works around this behavior. Whenever a user creates or edits a non-public draft of an already published node, the module saves a new, published revision which is the clone of the most recent published version. and since the last saved published revision has a higher revision ID than the “pending” version, users will not notice any change. (But in fact they will see a new revision.)
Workbench Moderation tracks the state transitions in its history database table, which contains an incremental history ID, an entity and an entity revision ID, the previous and the new state of the version, and the time when the transition happened.
An example how this works in Drupal 7:
# | Action performed | Rev. ID | From | To | Public | Current |
---|---|---|---|---|---|---|
1 | New node, sent to review | 1 | in_review | ◯ | ◯ | |
2 | Reviewer publishes | 2 | in_review | published | ◯ | ◯ |
3 | User creates a new draft | 3 | published | draft | ◯ | ◯ |
4 | WBM clones rev. #2 | 4 | published | published | ◯ | ◯ |
5 | Draft sent to review | 5 | draft | in_review | ◯ | ◯ |
6 | WBM clones rev #4 | 6 | published | published | ◯ | ◯ |
7 | Pending draft published | 7 | in_review | published | ◯ | ◯ |
8 | New draft | 8 | published | draft | ◯ | ◉ |
9 | WBM clones rev #7 | 9 | published | published | ◉ | ◯ |
Published node revision is #9, which is a clone of #7. But editors are editing #5 (this is what “current” means).
Content Moderation module in Drupal 9
Workflows and Content Moderation added to Drupal 8.2.x have clearer foundations: moderated workflows can track and maintain not only the moderation states of content revisions, but they can manage a flag on the revisions. This flag contains the info whether the version is a default revision or not. And Drupal’s Entity API is also aware of this. If an entity specific (and not entity revision specific!) read operation happens, Entity API checks the most recent revision flagged as being default. And this version’s state will determine whether the user can access the content, and if it is a public state, this will be the version they will see.
Here is the equivalent history data of the steps performed from the previous point on Drupal 9:
# | Action performed | Rev. ID | From | To | Public | Default |
---|---|---|---|---|---|---|
1 | New node, sent to review | 1 | in_review | ◯ | ◉ | |
2 | Reviewer publishes | 2 | in_review | published | ◉ | ◉ |
3 | User creates a new draft | 3 | published | draft | ◯ | ◯ |
4 | Draft sent to review | 4 | draft | in_review | ◯ | ◯ |
5 | Pending draft published | 5 | in_review | published | ◉ | ◉ |
5 | New pending draft | 6 | published | in_review | ◯ | ◯ |
Please note how this new “default” flag works: as long as there is no published version, all versions are “default”. It looks weird at first, but it totally makes sense: if a visitor tries to access the content early on, when only the draft revision #1 exists, Drupal should return a 403 response.
How can we move this from Drupal 7 to Drupal 9?
You may already have a clue what the solution is: we have to identify which node versions are the clones of a previous published revision, and skip their migration. And yes, this is its essence, beside other nits – like being able to identify the first published revision. Unfortunately, it wasn’t this easy to solve.
But I had the first rule I set up: Clones of a published revision shouldn’t be migrated.
The biggest challenges
Drupal 7 Node + Workbench Moderation allows users to change the state of a revision and even to delete revisions on the revision UI1. It isn’t a big problem if a published version clone was deleted, but in every other case, we will have “holes” in the revision history, and this needs to be bridged somehow.
States changed on revision UI
Why is it a problem that the state was changed on the revision UI? Because then Drupal does not create a new revision for the pending draft, so we will have more than one history record for the same revision. This is how it looks (watch out what happens between the sixth and eighth lines):
# | Action performed | Rev. ID | From | To |
---|---|---|---|---|
1 | New node, sent to review | 1 | in_review | |
2 | Reviewer approves and publishes | 2 | in_review | published |
3 | Published version was edited | 3 | published | published |
4 | New draft | 4 | published | draft |
5 | WBM clones rev #3 | 5 | published | published |
6 | New draft sent to review | 6 | draft | in_review |
7 | WBM clones rev #5 | 7 | published | published |
8 | Reviewer sets back to draft on rev. UI | 6 | in_review | draft |
9 | WBM clones rev #7 | 8 | published | published |
10 | Draft sent back ro review | 9 | draft | in_review |
11 | WBM clones rev #9 | 10 | published | published |
12 | New draft reviewed and published | 11 | in_review | published |
13 | First published rev. is restored | 12 | published | published |
This led me to set up the second rule that I applied when I was checking the “from” or the “to” states of a node revision: from state should be fetched from the first history entry, to state should be fetched from the last history entry of a node revision.
Removed revisions
I hope that it is obvious why it is a problem if revisions are deleted. Assume that only revision #4, #6 and #12 are available in the source. These are the only revisions which will be migrated by d7_node_complete
. Just take a look at this history where I removed those history entries whose revision was deleted:
# | Action performed | Rev. ID | From | To |
---|---|---|---|---|
4 | New draft | 4 | published | draft |
6 | New draft sent to review | 6 | draft | in_review |
8 | Reviewer sets back to draft on rev. UI | 6 | in_review | draft |
13 | First published rev. is restored | 12 | published | published |
How can we migrate these to the destination site by retaining the highest level of data integrity?
We can see the connection between the two non-published revision2. But how can we connect the last one’s to state with the last, published revision’s state? We cannot assume that there is a direct transition from draft to published!
In such cases, the solution is a bit sad: If there is no connection between the first drafts and the first published version, then we ask d7_node_complete
to skip migrating the stale drafts. This was the third rule.
Unpublished nodes which were published before
I want to show another tricky example3:
# | Action performed | Rev. ID | From | To |
---|---|---|---|---|
1 | New draft sent to review | 1 | in_review | |
2 | Reviewer publishes | 2 | in_review | published |
3 | New draft | 3 | published | draft |
4 | WBM clones rev #2 | 4 | published | published |
5 | Unpublished on revision UI | 4 | published | draft |
We can migrate rev #1 and #2, but what should we do with #3 and #4? I obviously cannot drop #4 – although it was a clone initially, it became a real transition because this was the published revision which archived the whole content. And on the other hand, if I didn’t migrate it, then the node would be published in Drupal 9, because revision #2 was public, and #3 is only a new, pending revision.
Well, I checked what happens on Drupal 9 if I repeat the same steps there, and there was a big relief: every single moderated content can be archived, even if it has a pending version.
So this transition log migrated into Drupal 9 looks like this :
# | Action performed | Rev. ID | From | To | Public | Default |
---|---|---|---|---|---|---|
1 | New draft sent to review | 1 | in_review | ◯ | ◉ | |
2 | Reviewer publishes | 2 | in_review | published | ◉ | ◉ |
3 | New draft | 3 | published | draft | ◯ | ◯ |
4 | Content unpublished | 4 | published | archive | ◯ | ◉ |
Drupal core issues discovered:
- #3200949-9: Non-default entity revisions are migrated as default revision because EntityContentComplete does not allow creating forward (and non-default) revisions
- #3052115-32: Mark an entity as ‘syncing’ during a migration ‘update’ and possibly test syncing semantics (no changed item bump, no content moderation revisions)
- #2329253-78: Allow the ChangedItem to skip updating when synchronizing (f.e. when migrating)
Footnotes:
-
In these cases, the moderation history table will still contain the history of the node, so we won’t have the revision data available. ↩
-
Yes, there are three non-public table rows in the table, but two of them were recorded for the same node revision ID. ↩
-
This represents the typical history entries of the nodes we got the incident notice from the customer about. ↩