Soundcloud Web Login Reverse Engineering

In depth reverse engineering of WhiteOps antibot and SoundCloud web login

·

8 min read

In order to protect themselves from credentials stuffing, SoundCloud is using Human antibot protection, formerly known as WhiteOps. But is it 100% bulletproof ?

In this article, I will show you how to bypass this antibot and how to attempt a login without any browser, just by making the right requests. Using requests instead of browser automation library such as puppeteer or selenium will make the scaling easier and saves a lot of CPU and RAM resources.

Here's the plan :

  1. Reverse engineering WhiteOps and SoundCloud scripts to find protections
  2. Creating modules to bypass each of the protections
  3. Tying up everything inside a script

I - Analyzing the login payload

So here's how a log in attempt to Soundcloud looks like :

image.png

We see multiple fields in the payload. Some are specifics to WhiteOps and some are specific to SoundCloud.

We will try first to understand what each one means.

a) The obvious ones

Those are easy and don't need any reverse engineering to understand their purpose.

NameDescription
user_agentIt's of course the user agent of your browser.
recapthca_pubkeyThe public key for reCAPTCHA that Soundcloud use.
recaptcha_responseThe reCAPTCHA response. Here it is null because we didn't have any captcha to complete.
credentials The reCAPTCHA response. Here it is null because we didn't have any captcha to complete.

There are 2 fields related to Soundcloud own protection : device_id and signature.

signature

This one is a bit complicated, we will need to take a look at the .js files loaded to understand how this signature is generated.

Let's open the Source tab in our dev console and do a basic string research. By searching for “signature”, we see an interesting class in a .js file called web_auth-xxxx.js :

We need to clean this a bit.

Since the obfuscation is very basic, we will firstly use unminify.io to clean the syntax. For more complex obfuscation we could also use shift-refactor.

We can see a lot of call to a function f, it is just very basic string obfuscation, and we can quickly make a script that will replace the calls to this function with the value they return.

Finally, we can try to understand what the script is doing and rename some variables.

Here's the final result :

The initialize function is called when the script is first loaded. It initializes 5 event listeners that will increment their respective counters.

Then, when we attempt a login, the sign function is called. It takes 3 parameters, mailCredential which is the mail we use for log in and signatureSecret and clientId which are values that are hardcoded in previously loaded .js scripts.

The signature returned is composed of 4 parts separated by a :

  • signatureVersion is the first part and is hardcoded in this .js file.
  • signatureString is composed of multiple elements too :
    • browserCheck, an integer that is used to represent some value about your browser
    • automationCheck, an integer that is used to check for automation tools.
    • executionTime, which is the time difference between the initialization of the class and the calling of the signfunction.
    • trustedEventCounter which count the number of trusted event listened.
    • screenResolution, the resolution of your screen
    • pluginsCheck, check for usual plugins in your browser
    • keyUpEventCounter, number of keyup event listened.
    • keyDownEventCounter, number of keydown event listened.
  • signatureHash is the hash of the signature string and other stuff. It's like an integrity check for the signature
  • and then we have the click event counter.

deviceId

This one is easier. deviceId is also called anonymousId, by checking for this string in the above script, we find a function that looks like this :

function(e, t, n) {
    function r() {
        return Math.floor(1e6 * Math.random())
    }
    t.generateAnonymousId = function() {
        return [r(), r(), r(), r()].join("-")
    }
}

So the deviceId is just randomly generated.

clientId

The first reference of the string clientId can be found in a script called 49-xxxxx.js. This is one of the very first script loaded by SoundCloud. ClientId is actually just hardcoded in this script.

There are four properties related to WhiteOps.

OZ_DT and OZ_TC

Those values can be found in a script called clear.js. OZ_TC is a string that will be used to encrypt OZ_SG.

client_ds

  • ci, et, mo, pd, ui and ri are WhiteOps parameters related to the website. They won't change ever for any SoundCloud requests.
  • ck is the deviceId base64 encoded.
  • si is the clientId base64 encoded.

OZ_SG

OZ_SG is the centerpiece of WhiteOps antibot. It is a very large encrypted string. After decryption, we find that it is actually a JSON object that contains a lot of information about your browser and your behavior on the website. For example, we can find audio fingerprinting, WebGL properties, mouse and key events, navigation performances…

Decrypting

The first thing we need to do is to decrypt this string. We will look into the main.js file, which is the main script for the WhiteOps antibot. By searching for OZ_SG in this file, we can find this line where we can see our 3 parameters :

image.png

Now let's take a look at this n.AQ()function :

image.png

We can see a call to the function this.KT.Nx(). This function takes care of the encryption and takes two parameters :

  • I.fV().pd() which is our OZ_TC
  • JSON.stringify(t) which is our OZ_SG before the encryption

Let's put a breakpoint here and try to login :

image.png

We can see our OZ_SG as a JSON object before it is encrypted. This is enough to understand what is happening, we don't need to reverse the encryption function. If you want to see the encryption/decryption code, you can check it out on my GitHub : github.com/tradertrue/WhiteOpsToolkit/tree/..

Analyzing properties

I won't get into the details too much since I've already made this tool where you can see the data, a cleaned code and some hints on how to spoof most of the properties. In order to do that, I've just checked in the main.jsfile the code for every property. Then I put the code on unminify.io to have a cleaner syntax and renamed some variables. If I had any doubt on what the code was doing, I just put a breakpoint to better understand how it was working.

II - Spoofing

After analyzing every field in the payload, we will see how to spoof every one of them.

credentials

This is just the mail and password that you use to log in.

device_id

As we have seen previously, device_id is composed of 5 number randomly generated.

user_agent

The user agent in the payload must be coherent with the one sent in the header and the one used to spoof OZ_SG.

signature

We've also seen the script that generate the signature. We can make a simple script that will generate a valid SoundCloud signature. Surprisingly, data from this signature and from OZ_SG don't have to be necessarily correlated.

import { randomInteger } from "./utils";

/** Hardcoded in the .js script */
const signatureVersion = 8;

export function generateSoundCloudSignature(userAgent: string, clientId: string, mailCredential: string, signatureSecret: string): string {
  let automationCheck = 1; // should always be 1
  let browserCheck = 33;
  let executionTime = randomInteger(3000, 5000);
  let pluginsCheck = 1283;
  let trustedEventCounter = randomInteger(100, 300);
  let screenResolution = 3686400; // fixed but needs to be randomly generated
  let keysEventCounter = randomInteger(6, 14);
  let clickEventCounter = randomInteger(6, 14);

  let signatureString = [
    browserCheck,
    automationCheck,
    executionTime,
    trustedEventCounter,
    screenResolution,
    pluginsCheck,
    keysEventCounter,
    keysEventCounter,
  ]["join"]("-");

  let a = clientId + signatureVersion + signatureString + userAgent + mailCredential + signatureSecret + signatureString + clientId;
  let b = unescape(encodeURIComponent(a));
  let signatureHash = 8011470;
  for (let i = 0; i < b["length"]; i += 1) {
    signatureHash = (signatureHash >> 1) + ((1 & signatureHash) << 23);
    signatureHash += b["charCodeAt"](i);
    signatureHash &= 16777215;
  }

  return signatureVersion + ":" + signatureString + ":" + signatureHash.toString(16) + ":" + clickEventCounter;
}

client_ds

client_ds will look like this :

 const client_ds = {
    ci: 646297,
    ck: new Buffer(deviceId).toString("base64"),
    et: "1",
    mo: 2,
    pd: "acc",
    ri: "signInPasswordForm",
    si: "", // si is always empty for soundcloud
    ui: "", // ui is always empty for soundcloud
  };

Those parameters will never change for SoundCloud.

OZ_TC and OZ_DT

We have to get those value from the clear.jsscript. We can just do a GET request to get this script and extract those values with simples regexes.

const clearJsResp = (
  await requestTLS.get(
    `https://s.pwt.soundcloud.com/ag/646297/clear.js?ci=646297&dt=6462971605583802699000&mo=2&pd=acc&spa=1`
  )
).body;
let ozTc = /ozoki_tc = \"([a-z0-9A-Z]+)/.exec(clearJsResp)![1];
let ozDt = /ozoki_dt = \"([a-z0-9A-Z\+\/\\\=]+)/.exec(clearJsResp)![1];

OZ_SG

I won't get into the details for this one, if you want to know more you can contact me (Link) :) To summarize, i've taken a decrypted OZ_SG and edited the revelant properties. That includes mouse and key events, navigation performancess, some timestamps and randomized quite a lot of other things. This works very well at a small scale but for large scale we will need to have a lot of dumps from real browsers and build a more robust algorithm to sspoof OZ_SG's properties.

III - Creating a script

We now know how to spoof every fields in the payload so we can create a script to try that out! Nowadays, most antibot are checking for TLS fingerprints so we will use an excellent library called CycleTLS to spoof that.

Here's how it looks like : github.com/tradertrue/WhiteOpsToolkit/tree/..

I've seen a lot of other requests during the login and I was concerned we needed to spoof them also.

Especially these two requests :

image.png

image.png

So I've tried with a Man in the Middle proxy to intercept and cancel those requests on my real browser and I wassable to login successsfully, so no need to worry about them!

To conclude, we've seen that while WhiteOps protection seems robust it can still be bypassed if you have enough knowledge and time to do it. The lack of obfuscation in their scripts makes the reverse engineering trivial and the data collected can be easily spoofed.