The drop is always movingYou know that saying about standing on the shoulders of giants? Drupal is standing on a huge pile of midgetsAll content management systems suck, Drupal just happens to suck less.Popular open source software is more secure than unpopular open source software, because insecure software becomes unpopular fast. [That doesn't happen for proprietary software.]Drupal makes sandwiches happen.There is a module for that

Double theta to compute delta

Submitted by nk on Thu, 2014-07-03 17:21

We are denormalizing into a field past migration; while usually I freak out writing directly to the field tables in this case it's justified because it's a custom module (never do this in contrib) and also because we are well aware of what does not happen when you do that (pluggable field storage and hooks). With that said, we have a query that roughly looks like this:

INSERT INTO {field_data_field_denorm_data}
(entity_type, bundle, deleted, entity_id, revision_id, language, delta, field_denorm_data_target_id)
SELECT 'user', 'user', 0, field1, field1, 'und', ????????????, field2
FROM table1
.... long complex query where a single field1 has several field2.

So yeah. But delta... delta is a problem. MSSQL has ROW_NUMBER(), MySQL does not and the usual replacement (user variables, @x:=@x+1) is documented to be incorrect and not reliable. There's a lesser known trick that can be used, however:

CREATE TABLE tmp
SELECT DISTINCT field1, field2
.... long complex query repeated

now, ALTER TABLE tmp ADD KEY(field1) in preparation, and then:

INSERT INTO {field_data_field_denorm_data}
(entity_type, bundle, deleted, entity_id, revision_id, language, delta, field_denorm_data_target_id)
SELECT 'user', 'user', 0, t1.field1, t1.field1, 'und', COUNT(t2.field2), t1.field2
FROM tmp t1
LEFT JOIN tmp t2 ON t1.field1=t2.field1 AND t1.field2 > t2.field2
GROUP BY t1.field1,t1.field2

Double theta for the win.

Explanation (numbers copied from real data):

+-------------+----------+---------------------------------+
| t1.field1   |t1.field2 | group_concat(t2.field2)         |
+-------------+----------+---------------------------------+
|        4440 |      427 | NULL                            |
|        4440 |      428 | 427                             |
|        4440 |      429 | 427,428                         |
+-------------+----------+---------------------------------+

Consider the user 4440. When t1.field2 is 427, when joining with t2, there's nothing smaller than that, so the count and the delta is 0. When t1.field2 is 428 now we have 427 smaller than it, so the count and the delta is 1. And so on.

This works because MySQL orders on what you group. Otherwise, we would need to add an ORDER BY as well.

Advertisment: If you find this too complex, consider hiring Tag1 Consulting for your database problems.

Commenting on this Story is closed.

Submitted by mikeytown2 on Thu, 2014-07-03 19:54.

If I need to write to a field I end up using field_sql_storage_field_storage_write. This is a good guide on how to use it: http://timonweb.com/how-insert-and-update-only-specific-fields-your-entity-drupal-7. I don't know the exact requirements of this project but since your using group_concat() I'm guessing field_sql_storage_field_storage_write wouldn't work. If your working with very large data sets, keep in mind that group_concat has a max length. I usually run this `'SET SESSION group_concat_max_len = 65535'`; default is 1024.

Submitted by nk on Fri, 2014-07-04 09:00.

The point is the COUNT() for delta. Retrieving the data set into PHP and calling a PHP function and running a query per row is incredibly slow. The query above, even on my laptop can deal with (tens of) thousands of records per second.

Good point on the group_concat_max_len ; I am well aware; check the https://www.drupal.org/project/quickstats I wrote for NowPublic ages ago.