Nowadays, I am working on the migration of a Drupal 7 site with millions of nodes and 10 millions of comments. We also have some flaggings to import. But after we managed to migrate all the Flagging entities we needed, we found that our views on the main page got extremely slow 😳. As it turned out, we had an extremely slow Views query which used flagging relationship.

The problem

The extremely slow query is caused by the different data type used in the flagging table. Nodes, comments (basically most the content entities I’m aware of) are using unsigned integers for their ID column, while flagging uses varchar. And in my opinion, Flag module is right! There is no constraint which forces entities to have integer IDs: it is just the default type in case of content entities.

Anyway, the query we need to execute joins flagging and node_field_data tables by comparing a VARCHAR column to an INT column. This has two disadvantages:

  1. Comparison falls back to floating-point comparison which isn’t precize at all and this is also much slower than comparing a VARCHAR to VARCHAR or INT to INT. Size and other configurations also must match – in an ideal situation, an INT(10) UNSIGNED column should be compared to another INT(10) UNSIGNED).
  2. MySQL/MariaDB cannot use index, so the query will be much slower:

    For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly.

    (By the way, there is no such index on the flagging table anyway.)

The simple solution would be to change the data type of the entity_id in the flagging table and add an index, but I think that would be a very bad idea, because if we did, it would break Flag module’s current capabilities and would possibly cause issues even if newer Flag releases try to change or update the column.

The solution we applied (for Views)

We decided to add a new integer column to the flagging table, then in Flag module’s Views relationship plugin, we could use this column if the base table’s column’s type is also integer. So the tasks were:

  • Add a new column to the base table of flagging entities (so to the flagging table: entity_id_int INT(10) UNSIGNED defaulting to NULL).
  • Ask the Flag module to fill this column with the value of flagging’s entity_id property if the property value is numeric.
  • Change the related query added by the flag_relationship plugin, so if flagging records are joined and the other table’s appropriate column has integer type, we change flagging.entity_id column to flagging.entity_id_int
  • Add the appropriate index with an enhanced flagging storage schema handler.
  • In an update hook, install the new column and the new storage schema handler.
  • In a post update function, fill the new column with the values fetched from the preexisting record’s entity_id column.

New base field

The easiest (and the Drupal-) way to add a new column to the base table of an entity is adding a new base field. Base fields are available for every entity bundle, which makes them the best fit for our situation too.

To add a new base field to an entity type provided by another module, we can implement hook_entity_base_field_info() (see the documentation at ContentEntityBase::baseFieldDefinitions):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
 * Implements hook_entity_base_field_info().
 */
function better_flag_entity_base_field_info(EntityTypeInterface $entity_type): array {
  if ($entity_type->id() !== 'flagging') {
    return [];
  }

  return [
    'entity_id_int' => BaseFieldDefinition::create('integer')
      ->setLabel(t('Entity ID as integer'))
      ->setDescription(t('The ID of the flagged entity if it is integer.'))
      ->setSetting('unsigned', TRUE)
      // Provider must be set to 'flag'.
      // - If it's set to 'better_flag' then on uninstall, ModuleInstaller tries
      //   to remove it even if we delete this field in our uninstall hook. Why?
      //   Because this hook gets invoked!
      // - But if we don't do anything on uninstall, the entity schema won't be
      //   updated and the 'flagging' entity's schema will be outdated - meaning
      //   that the extra column in DB won't be removed.
      ->setProvider('flag')
      ->setInitialValue(NULL),
  ];
}

Of course, if the entity was already available before our module was installed, we have to provide an update hook to have its column installed to the database table. This is one of the tasks that EntityDefinitionUpdateManager was designed for!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
 * Installs 'entity_id_int' column to Flagging entities' DB table.
 */
function better_flag_update_9001(): void {
  \Drupal::entityDefinitionUpdateManager()->installFieldStorageDefinition(
    'entity_id_int',
    'flagging',
    'flag',
    BaseFieldDefinition::create('integer')
      ->setLabel(t('Entity ID as integer'))
      ->setDescription(t('The ID of the flagged entity if it is integer.'))
      ->setSetting('unsigned', TRUE)
      ->setInitialValue(NULL)
  );
}

After our new base field was installed in the update hook above, we weren’t done yet:

  1. We still had to update every pre-existing record, so they will have the right value in the new base field.
  2. We also had to ensure that new flaggings would have the appropriate value there.

Let’s fill the new column with the right value for the preexisting flagging entities! Ideally, this should be done in a post update function. We only updated those records where the value of the original entity_id column only contained integer characters:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
 * Fill 'entity_id_int' column values.
 */
function better_flag_post_update_fill_entity_id_integer_column(): void {
  try {
    $flagging_definition = \Drupal::entityTypeManager()->getDefinition('flagging');
    $connection = \Drupal::database();
    $regexp_operator = $connection->databaseType() === 'pgsql' ? '~' : 'REGEXP';
    
    $connection->update($flagging_definition->getBaseTable())
      ->expression('entity_id_int', 'entity_id')
      ->condition('entity_id', '^[0-9]+$', $regexp_operator)
      ->execute();
  }
  catch (\Exception $e) {
    throw new UpdateException($e->getMessage());
  }
}

After this, in a hook_ENTITY_TYPE_presave() implementation, we ensured that new flaggings have the right value in the new entity_id_int field.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
 * Implements hook_ENTITY_TYPE_presave() for flagging.
 */
function better_flag_flagging_presave(EntityInterface $flagging) {
  assert($flagging instanceof FlaggingInterface);
  assert($flagging instanceof ContentEntityBase);
  if (!preg_match('/^\d{1,10}$/', $flagged_entity_id = $flagging->getFlaggableId())) {
    return;
  }

  // We know that the entity ID is numeric (and it isn't longer than 10 digits),
  // so we can store the ID value in our extra integer field too.
  $flagging->set('entity_id_int', $flagged_entity_id);
  $flagging->updateOriginalValues();
}

Views relationship

The next task was to make the flag_relationship Views plugin to use the new integer column for joining its table whenever the base table’s corresponding field also has an integer type.

Luckily, Views utilizes Plugin API for most of the components it is using. It is very easy to alter for example the flag_relationship plugin. We don’t have too many options, but all what we need to do is to replace the plugin’s class in the plugin definition with our enhanced, “smarter” class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
 * Replacement class for FlagViewsRelationship.
 */
class BetterFlagViewsRelationship extends FlagViewsRelationship {

  /**
   * {@inheritdoc}
   */
  public function query() {
    if (
      // We have a real DB column to join on.
      isset($this->realField) &&
      // The current display has an entity type.
      !empty($entity_type = $this->getEntityType()) &&
      // The column is one of the base fields of the entity type.
      ($base_field_definition = \Drupal::service('entity_field.manager')->getBaseFieldDefinitions($entity_type)[$this->realField] ?? NULL) &&
      // The type of the column is integer.
      ($base_field_definition instanceof FieldDefinitionInterface && $base_field_definition->getType() === 'integer')
    ) {
      $this->definition['base field'] = 'entity_id_int';
    }

    parent::query();
  }

}

After this we could go ahead and change the plugin definition by using a hook_views_plugins_relationship_alter() hook:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
 * Implements hook_views_plugins_relationship_alter().
 */
function better_flag_views_plugins_relationship_alter(array &$plugins): void {
  // Replaces flag_relationship views plugin with our own, improved version
  // which then speeds up queries using integer type of entity ID column in case
  // of joins.
  if (
    isset($plugins['flag_relationship']) &&
    $plugins['flag_relationship']['class'] === FlagViewsRelationship::class
  ) {
    $plugins['flag_relationship']['class'] = BetterFlagViewsRelationship::class;
  }
}

Extend the entity table with new indexes

Finally, we extended the storage schema of flagging entities. Why? Because without having the matching index in place, the query still took about 1200—1500ms to finish with about 1 million flaggings.

This is a bit of a fragile solution: we cannot decorate these handlers, we can only change them. If another module has to do the same to flaggings, our replacement storage schema handler might be replaced, and we can’t take advantage of our dedicated index. All we can do is add a PHPUnit test that checks the schema handler of flaggings with all the modules in our project installed.

This is our replacement storage schema handler which adds the new index with flag_id and entity_id_int:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
 * Replacement class for Flagging entity storage schema handler.
 */
class BetterFlaggingStorageSchema extends FlaggingStorageSchema {

  /**
   * {@inheritdoc}
   */
  protected function getEntitySchema(ContentEntityTypeInterface $entity_type, $reset = FALSE) {
    $schema = parent::getEntitySchema($entity_type, $reset);
    $schema[$this->entityType->getBaseTable()]['indexes'] += [
      'better__flag_id__entity_id_int' => [
        'flag_id',
        'entity_id_int',
      ],
    ];

    return $schema;
  }

}

Of course, we also had to replace the original handler with our own. This can be done with a hook_entity_type_alter() implementation:

1
2
3
4
5
6
7
8
/**
 * Implements hook_entity_type_alter().
 */
function better_flag_entity_type_alter(array $entity_types): void {
  // Replace storage schema class of flagging entities.
  assert($entity_types['flagging'] instanceof EntityTypeInterface);
  $entity_types['flagging']->setHandlerClass('storage_schema', BetterFlaggingStorageSchema::class);
}

As the last step, we needed yet another update hook where we could install the new storage schema. In our case, this means that – under the hood – Drupal will drop every preexisting index on the corresponding table(s) and then recreate the new ones.

1
2
3
4
5
6
7
8
9
/**
 * Install the new storage schema handler of flagging entities.
 */
function better_flag_update_9002(): void {
  $entity_definition_update_manager = \Drupal::entityDefinitionUpdateManager();
  $entity_type = $entity_definition_update_manager->getEntityType('flagging');
  $entity_type->setHandlerClass('storage_schema', BetterFlaggingStorageSchema::class);
  $entity_definition_update_manager->updateEntityType($entity_type);
}

That was all!

Diagnostic

If you’ve reached this line, I’m pretty sure you’re wondering what happened to the Views database query. “Unfortunately”, I was not patient enough to wait for the unoptimized database query to finish 😶.

Original

This was the original query (no result after ~30 minutes):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT 
    node_field_data.langcode AS node_field_data_langcode, 
    flagging_node_field_data.uid AS flagging_node_field_data_uid, 
    node_field_data.changed AS node_field_data_changed, 
    node_field_data.nid AS nid, 
    flagging_node_field_data.id AS flagging_node_field_data_id 
FROM node_field_data node_field_data 
LEFT JOIN flagging flagging_node_field_data ON 
    node_field_data.nid = flagging_node_field_data.entity_id AND 
    flagging_node_field_data.flag_id = 'custom_flag_id' 
LEFT JOIN comment_entity_statistics comment_entity_statistics ON 
    node_field_data.nid = comment_entity_statistics.entity_id AND 
    comment_entity_statistics.entity_type = 'node' 
WHERE 
    (node_field_data.status = '1') AND 
    (node_field_data.type IN ('topic')) AND 
    (comment_entity_statistics.comment_count > '0')
ORDER BY node_field_data_changed DESC 
LIMIT 10 OFFSET 0;

Original EXPLAIN result:

1
2
3
4
5
6
7
+----+-------------+---------------------------+------------+-------+--------------------------------------------------------------------------------------------+--------------------------------------+---------+--------------------------------------------+---------+----------+-----------------------------------------------------------+
| id | select_type | table                     | partitions | type  | possible_keys                                                                              | key                                  | key_len | ref                                        | rows    | filtered | Extra                                                     |
+----+-------------+---------------------------+------------+-------+--------------------------------------------------------------------------------------------+--------------------------------------+---------+--------------------------------------------+---------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | comment_entity_statistics | NULL       | range | PRIMARY,comment_count,testing__entity_id__entity_type__comment_count                       | comment_count                        | 4       | NULL                                       | 1874806 |    10.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | node_field_data           | NULL       | ref   | PRIMARY,node__id__default_langcode__langcode,node_field__type__target_id,node__status_type | node__id__default_langcode__langcode | 4       | drupal.comment_entity_statistics.entity_id |       1 |     5.00 | Using where                                               |
|  1 | SIMPLE      | flagging_node_field_data  | NULL       | ref   | flagging_field__flag_id__target_id,entity_id__uid                                          | flagging_field__flag_id__target_id   | 34      | const                                      |    3091 |   100.00 | Using where                                               |
+----+-------------+---------------------------+------------+-------+--------------------------------------------------------------------------------------------+--------------------------------------+---------+--------------------------------------------+---------+----------+-----------------------------------------------------------+

Optimized query

This is the optimized query – it finishes in 1ms-3ms!:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT
    node_field_data.langcode AS node_field_data_langcode, 
    flagging_node_field_data.uid AS flagging_node_field_data_uid, 
    node_field_data.changed AS node_field_data_changed, 
    node_field_data.nid AS nid, 
    flagging_node_field_data.id AS flagging_node_field_data_id 
FROM node_field_data node_field_data 
LEFT JOIN flagging flagging_node_field_data ON 
    node_field_data.nid = flagging_node_field_data.entity_id_int AND 
    flagging_node_field_data.flag_id = 'custom_flag_id'
LEFT JOIN comment_entity_statistics comment_entity_statistics ON 
    node_field_data.nid = comment_entity_statistics.entity_id AND 
    comment_entity_statistics.entity_type = 'node' 
WHERE 
    (node_field_data.status = '1') AND 
    (node_field_data.type IN ('topic')) AND 
    (comment_entity_statistics.comment_count > '0') 
ORDER BY node_field_data_changed DESC 
LIMIT 10 OFFSET 0;

Optimized EXPLAIN result:

1
2
3
4
5
6
7
+----+-------------+---------------------------+------------+-------+--------------------------------------------------------------------------------------------+------------------------------------------------+---------+----------------------------------+------+----------+--------------------------+
| id | select_type | table                     | partitions | type  | possible_keys                                                                              | key                                            | key_len | ref                              | rows | filtered | Extra                    |
+----+-------------+---------------------------+------------+-------+--------------------------------------------------------------------------------------------+------------------------------------------------+---------+----------------------------------+------+----------+--------------------------+
|  1 | SIMPLE      | node_field_data           | NULL       | index | PRIMARY,node__id__default_langcode__langcode,node_field__type__target_id,node__status_type | node_field__changed                            | 4       | NULL                             |   36 |     5.00 | Using where              |
|  1 | SIMPLE      | comment_entity_statistics | NULL       | ref   | PRIMARY,comment_count,testing__entity_id__entity_type__comment_count                       | testing__entity_id__entity_type__comment_count | 38      | drupal.node_field_data.nid,const |    1 |    50.00 | Using where; Using index |
|  1 | SIMPLE      | flagging_node_field_data  | NULL       | ref   | flagging_field__flag_id__target_id,better__flag_id__entity_id_int                          | better__flag_id__entity_id_int                 | 39      | drupal.node_field_data.nid,const |    1 |   100.00 | NULL                     |
+----+-------------+---------------------------+------------+-------+--------------------------------------------------------------------------------------------+------------------------------------------------+---------+----------------------------------+------+----------+--------------------------+