User identification
CNIL recommendation
The replacement of the user ID by the proxification server. To ensure effective pseudonymization, the algorithm performing the replacement should ensure a sufficient level of collision (i.e. a sufficient probability that two different identifiers will give an identical result after hashing) and include a variable temporal component (adding a value to the hashed data that evolves over time, so that the hashed result is not always the same for the same identifier);
Solution
Two fields sent to GA4 identify the user over time:
x-ga-js_client_id
: Thex-ga-js_client_id
is the cid value sent in the browser request. In the request sent by the server to the GA4, this field is calledjscid
.client_id
: Theclient_id
is based on this same value (cid) but has been processed by the GA4 client. In the request sent by the server to GA4, this field is calledcid
.
We'll need to pseudonymize these two fields so that Google can't link multiple sessions from the same user.
To ensure this uniqueness across multiple sessions, we're going to create a variable template that will hash an input value (in this case, our client_id) by adding a temporal component (part of a timestamp).
To do this, go to Templates > Variable Templates > New.
In the "Info" section, we'll name the template (in this case, "Pseudonymize Variable").
In the second tab (Fields), we'll add a field to associate our variable with being pseudonymized.
Create a new field
- Click on 'Add Field' > Select 'Text Input' > Rename the field to 'input' > Enter 'Value to pseudonymize' as the display name.
Add a new control
- Click on the cogwheel (parameters) > Activate 'Validation rules' > Click on 'Add rules' > The option 'This value cannot be empty' must be selected by default.
The third tab (Code) is the processing applied to this variable. Here's the code to copy and paste into the appropriate field.
const ts = require("getTimestampMillis");
const sha256Sync = require('sha256Sync');
const math = require("Math");
var t = ts();
var key = math.round(t / 10000000);
return sha256Sync(data.input + '.' + key, {outputEncoding: 'hex'});
Our variable will be hashed after concatenation with a key that changes every 2 hours and 45 minutes. This modification allows us to create identical client_ids within a session, but different from one session to the next. This will prevent GA4 from tracking a user over time.
Now that we've created this variable model, we'll need to update our client_id
and x-ga-js_client_id
before sending it to Google.
First, we create a GA4 - Client ID variable, which will retrieve the value of client_id from the event data.
Variables > New:
- Variable name:
GA4 - Client ID
> - Variable type: Event Data. Variable type: Event Data- Key path:
client_id
Next, we use our previously created template to pseudonymize this variable.
Variables > New:
- Variable name:
GA4 - Client ID Pseudonymized
- Variable type: Pseudonymized Variable
- Key path:
{{GA4 - Client ID}}
Our client_id is now pseudonymized; we need to repeat these last two steps for the x-ga-js_client_id
.
Variables > New:
- Variable name:
GA4 - JS Client ID
> - Variable type: Event Data Variable type: Event Data- Key path:
x-ga-js_client_id
Next, we use our previously created template to pseudonymize this variable.
Variables > New:
- Variable name:
GA4 - JS Client ID Pseudonymized
- Variable type: Pseudonymized Variable
- Key path:
{{GA4 - JS Client ID}}
Once these two variables have been pseudonymized, they must be assigned in the GA4 tag to replace the existing variables.
- Tags > Open the GA4 tag > Open the 'Event Parameters' section > Add these two variables in the 'Parameters to Add / Edit' section.
Our cid and jscid are now pseudonymized.