Database Query Trick for
SQL-based Migrate Source Plugins
In my previous post about strange empty destination property handling I mentioned Yash Rode’s brilliant YouTube field → media migration feature. But I missed taking a note about a crucial trick we used there: how we managed to use a column alias as migration source item identifier.
The problem
Let’s suppose there are multiple YouTube fields on the source Drupal 7 site! Each of them have different field names, so the field property column names are also different:
Field name | Input property column | Video ID column |
---|---|---|
field_yt_foo |
field_yt_foo_input |
field_yt_foo_video_id |
field_yt_bar |
field_yt_bar_input |
field_yt_bar_video_id |
field_yt_baz |
field_yt_baz_input |
field_yt_baz_video_id |
In Drupal’s Migrate API, SQL-based source plugins should return a database select query (a Drupal\Core\Database\Query\SelectInterface
). But how can we get all the data in a single database query? Well, we can’t. But we can use a separate query to get the list of the field names with the specified type:
1
2
3
4
5
6
7
8
9
10
protected function getYouTubeFieldNames(): array {
$youtube_fields = $this->select('field_config')
->fields('field_config', ['field_name'])
->condition('type', 'youtube')
->condition('module', 'youtube')
->execute()
->fetchAllKeyed(0, 0);
return array_values($youtube_fields);
}
…Then use the list to build a UNION
query (the static::addUnionQuery()
is explained in this1 footnote):
1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* {@inheritdoc}
*/
public function query() {
$union_query = NULL;
foreach ($this->getYouTubeFieldNames() as $field_name) {
$subquery = $this
->select("field_data_$field_name", $field_name)
->fields("field_data_{$field_name}", ["{$field_name}_input"]);
static::addUnionQuery($union_query, $subquery);
}
return $union_query;
}
Query string:
1
2
3
4
5
6
7
8
9
10
11
SELECT
"field_data_field_yt_foo"."field_yt_foo_input" AS "field_yt_foo_input"
FROM {field_data_field_yt_foo} "field_yt_foo"
UNION
SELECT
"field_data_field_yt_bar"."field_yt_bar_input" AS "field_yt_bar_input"
FROM {field_data_field_yt_bar} "field_yt_bar"
UNION
SELECT
"field_data_field_yt_baz"."field_yt_baz_input" AS "field_yt_baz_input"
FROM {field_data_field_yt_baz} "field_yt_baz"
Unfortunately, we cannot really use this query as the iterator source of our migrate source: Migrate source plugins must specify which columns are identifying a single migration row. (These are called “source IDs”.) The identifier2 should be unique for every migrate source item. If they aren’t unique, every next item with the same ID will be ignored during the migration3.
So instead of using the name of the input
field property, let’s use an alias, and define the alias as source item identifier!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* {@inheritdoc}
*/
public function query() {
$union_query = NULL;
foreach ($this->getYouTubeFieldNames() as $field_name) {
$subquery = $this->select("field_data_$field_name", $field_name);
$subquery->addField($field_name, "{$field_name}_input", 'input');
static::addUnionQuery($union_query, $subquery);
}
return $union_query;
}
/**
* {@inheritdoc}
*/
public function getIds() {
return ['input' => ['type' => 'string']];
}
Query string:
1
2
3
4
5
6
7
8
9
10
11
SELECT
"field_data_field_yt_foo"."field_yt_foo_input" AS "input"
FROM {field_data_field_yt_foo} "field_yt_foo"
UNION
SELECT
"field_data_field_yt_bar"."field_yt_bar_input" AS "input"
FROM {field_data_field_yt_bar} "field_yt_bar"
UNION
SELECT
"field_data_field_yt_baz"."field_yt_baz_input" AS "input"
FROM {field_data_field_yt_baz} "field_yt_baz"
But this is still wrong! Although the query works as expected, but as soon as we try to use joinable source and ID map databases, we will have an ugly message in our migrate kernel test:
Migration failed with source plugin exception: SQLSTATE[42703]: Undefined column: 7 ERROR: column "input" does not exist
LINE 4: ...ublic.60254974m_map_d7_youtube_field "map" ON input = ma...
^:
SELECT "field_yt_foo"."field_yt_foo_input" AS "input", "map"."sourceid1" AS "migrate_map_sourceid1", "map"."source_row_status" AS "migrate_map_source_row_status"
FROM
"602549740field_data_field_yt_foo" "field_yt_foo"
LEFT OUTER JOIN .public.60254974m_map_d7_youtube_field "map" ON input = map.sourceid1
WHERE ("map"."sourceid1" IS NULL) OR ("map"."source_row_status" = :db_condition_placeholder_0) UNION SELECT "field_yt_bar"."field_yt_bar_input" AS "input"
FROM
"602549740field_data_field_yt_bar" "field_yt_bar" UNION SELECT "field_yt_baz"."field_yt_baz_input" AS "input"
FROM
"602549740field_data_field_yt_baz" "field_yt_baz"; Array
(
[:db_condition_placeholder_0] =<; 1
)
in /Users/zoli/projects/media_migration/zdev/public_html/core/lib/Drupal/Core/Database/ExceptionHandler.php line 79
In essence this happens because the source plugin tries to exclude the source records from the database query which are already migrated. Although the outer joined query SqlBase has added is unaware of that we have a UNION query, the exception totally makes sense: We really don’t have an “input” column – since it’s just an alias of the field_yt_foo_input
, field_yt_bar_input
and field_yt_baz_input
columns.
The solution
Just check out the docblock of Drupal\Core\Database\Connection::select()
, focusing on the type of the first method parameter4! 🧐
@param
string
|\Drupal\Core\Database\Query\SelectInterface
$table
The base table name or subquery for this query, used in the FROM clause. If a string, the table specified will also be used as the “base” table for query_alter hook implementations.
Because of the name of the variable, we all5 think that $table
must be a string – the name of the table – but it can be either a select statement! And this is the essence of our solution as well: We wrap the “big” union query into a new query, which then allows us to use any kind of alias as source item identifier – be it a column alias or alias of a complex SQL expression!
It’s that simple:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* {@inheritdoc}
*/
public function query() {
$union_query = NULL;
foreach ($this->getYoutubeFieldNames() as $field_name) {
$subquery = $this->select("field_data_$field_name", $field_name);
$subquery->addField($field_name, "{$field_name}_input", 'input');
static::addUnionQuery($union_query, $subquery);
}
$wrapper_query = $this->select($union_query, 'all_yt')->fields('all_yt');
$wrapper_query->orderBy('all_yt.input');
return $wrapper_query;
}
This is the query string of the Drupal select above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
"all_yt".*
FROM (
SELECT
"field_data_field_yt_foo"."field_yt_foo_input" AS "input"
FROM {field_data_field_yt_foo} "field_yt_foo"
UNION
SELECT
"field_data_field_yt_bar"."field_yt_bar_input" AS "input"
FROM {field_data_field_yt_bar} "field_yt_bar"
UNION
SELECT
"field_data_field_yt_baz"."field_yt_baz_input" AS "input"
FROM {field_data_field_yt_baz} "field_yt_baz"
) "all_yt"
ORDER BY "all_yt"."input" ASC NULLS FIRST
Have nice database queries! 🙂
Footnotes:
-
This
::addUnionQuery()
is a very simple helper function:1 2 3 4 5 6 7
protected static function addUnionQuery(&$union_destination, SelectInterface $query): void { if ($union_destination instanceof SelectInterface) { $union_destination->union($query); return; } $union_destination = clone $query; }
If the destination query is empty, then it just sets
$destination
to the clone of the source query. But if it is already aSelectInterface
, then it ‘unions’ the two queries. ↩ -
We can define multiple identifiers too. ↩
-
This is a very handy feature! It is actively used in Drupal core migrate source plugins too: This is why
d7_field
andd7_field_instance
can use the same database query to get their source data. ↩ -
This option is available in
::join()
,::isNull()
etc too. As well as in::union()
! 😉 ↩ -
Or at least I’ve only known about this feature for two years (since 2020, when I started working on migrations). ↩