artin's Creations

where the magic happens

Reverse Engineering Shopify Private APIs

March 1st, 2013

Note: The full source is available on GitHub

Recently when working to migrate an e-commerce website to the aspiring Shopify Cloud Platform, generating coupon codes through the API was dismissed as being a obvious, simple, and apparently one of the most requested features back in March, 2011.

Unfortunately, I was wrong. So I contacted their support team to see what's up!

Shopify Support Snapshot
Thanks for the help, Brian. /s

Being unaccustomed to "no", and particularly impatient I decided to develop my own solution utilizing the very same API's Shopify created for themselves in their admin panel.

So it all began when I decided to poke at how they're being loaded into Shopify's back-end

GET /admin/discounts.json?limit=50&order=id+DESC&direction=next HTTP/1.1


{
    
"discounts": [{
        
"applies_once"false,
        
"applies_to_id"null,
        
"code""ypa73p",
        
"ends_at"null,
        
"id"12353868,
        
"minimum_order_amount""0.00",
        
"starts_at""2013-03-06T00:00:00-08:00",
        
"status""enabled",
        
"usage_limit"1,
        
"value""89.99",
        
"discount_type""fixed_amount",
        
"applies_to_resource"null,
        
"times_used"1
    
}, ... ]
}
  
                

Wait, so, that looks pretty friendly right? They've already done the work, so why can't I use it? ... I can! So, here's how.

First, let's take at the full HTTP request. (Snipped to the interesting parts)

       			
 GET /admin/discounts.json?limit=50&order=id+DESC&direction=next HTTP/1.1
 Host: myshop.myshopify.com
 X-CSRF-Token: +QjKt70XBMis/iZXz8VsvbfHkOcH+h45N38os4O1lJo=
 X-Requested-With: XMLHttpRequest
 X-Shopify-Api-Features: pagination-headers
 Cookie: _secure_session_id=150d716ebc55cf62xxx; storefront_digest=056eb6c39dd92c5171360c97d0xxxx;
        			

Nothing particularly special, there's a token we need to watch out for and, of course, our session cookies. So first thing's first, let's tackle the login form. I've trimmed this down to the bare necessities for your viewing pleasure


<form accept-charset="UTF-8" action="/admin/auth/login" method="post">
  <
input name="utf8" type="hidden" value="&#x2713;" />
  <
input name="authenticity_token" type="hidden" value="+QjKt70XBMis/iZXz8VsvbfHkOcH+h45N38os4O1lJo=" />
  <
input type="hidden" name="redirect" value="" id="redirect" />
  <
input type="email" name="login" size="30" id="login-input" class="email" />
  <
input type="password" name="password" size="16" id="password" />

  <
div id="open-id" style="display:none">
    <
div class="ppb clearfix">
      <
label id="open_id" for="openid-input" class="open-id">OpenID</label>  
      <
input type="text" name="openid_url" value="" class="url" id="openid-input" />
    </
div>
  </
div>
</
form>

                    
Spoiler: Looks like Shopify are at least playing with OpenID integration

In the interest of minimizing maintainance let's parse all of those <input>'s dynamically. I chose to use regex over a DOM parser because it seemed more appropriate in such a hacky project, and will save us having to worry about broken markup. We'll do this in two parts, first we'll grab the login form, and then the key/value pairs embedded within it.


private function getFields($data false) {
        
$data $data ?: $this->initGetData($this->store);

        if (
preg_match('/(<form.*?.*?<\/form>)/is'$data$matches))
            
$this->inputs $this->getInputs($matches[1]);
        
        return 
is_array($this->inputs) ? $this->inputs false;
}
                

Then the fields



private function getInputs($form$inputs = []) {
    if (!(
$els preg_match_all('/(<input[^>]+>)/is'$form$matches)))
        return 
false;
        
    for (
$i 0$i $els$i++) {
        
$el preg_replace('/\s{2,}/'' '$matches[1][$i]);
        
        if (
preg_match('/name=(?:["\'])?([^"\'\s]*)/i'$el$name
         && 
preg_match('/value=(?:["\'])?([^"\'\s]*)/i'$el$value))
            
$inputs[$name[1]] = $value[1];
    
    }
    
    return 
$inputs;
}
                

Once we have the data necessary, posting it to Shopify is a piece of cake. Cake's good, right?


  
public function login() {
    
$fields $this->inputs ?: $this->getFields();

    
$fields['login']  = $this->username;
    
$fields['password'] = $this->password;

    
$url $this->store self::_LOGIN_URL;
    
    
$this->ch curl_init($url);
    
    
$this->setOpts([
        
CURLOPT_POST => count($fields),
        
CURLOPT_POSTFIELDS => http_build_query($fields),
        
CURLOPT_HTTPHEADER => ['Shopify-Auth-Mechanisms:password']
    ]);    
    
    
$data curl_exec($this->ch);
    
$http_code curl_getinfo($this->chCURLINFO_HTTP_CODE);
    
    return (
$http_code == 200 && $this->setToken($data));
  }
                

The astute among you may notice the setToken call at the end. We'll get to this shortly. Also, setOptions is a function I crafted to keep the code clean, it will take care of setting the cookie jar and user-agent upon each request. Yes --Cake, and cookies.


 
private function setOpts($extra = []) {    
    
$default = [
        
CURLOPT_USERAGENT => self::_USER_AGENT,
        
CURLOPT_COOKIEJAR => self::_COOKIE_STORE,
        
CURLOPT_COOKIEFILE => self::_COOKIE_STORE,
        
CURLOPT_RETURNTRANSFER => true,
        
CURLOPT_FOLLOWLOCATION => true
    
];
    
    
$options $default array_filter($extra, function($v) {
        return !
is_null($v);
    });
    
    
curl_setopt_array($this->ch$options);
 }
                

So, back to it.. Now we're logged in - that's great! Let's see if we can request from the discounts.json file we saw used earlier.


$params 
= [
    
'limit' => 50
    
'order' => 'id+DESC'
    
'direction' => 'next'
];
    
$url $this->store $function urldecode(http_build_query($parameters));
$ch curl_init($url);    

$response curl_exec($ch);
$data json_decode($response);
                

Response


(
    
stdClass Object
    
(
        [
applies_once] => 
        [
applies_to_id] => 
        [
code] => wyrrw4
        
[ends_at] => 
        [
id] => 14256508
        
[minimum_order_amount] => 0.00
        
[starts_at] => 2013-03-01T00:00:00-08:00
        
[status] => enabled
        
[usage_limit] => 1
        
[value] => 5.0
        
[discount_type] => percentage
        
[applies_to_resource] => 
        [
times_used] => 0
    
)

    ... 
)
                
                

Awesome! It worked. POSTing turns out to be a little trickier, but let's get to it..

A Cross-site Request Forgery (CSRF) token is used for all POST requests internally as shown below

X-CSRF-Token: +QjKt70XBMis/iZXz8VsvbfHkOcH+h45N38os4O1lJo=

A little poking around reveals this token in the document body

             
  
<meta content="+QjKt70XBMis/iZXz8VsvbfHkOcH+h45N38os4O1lJo=" name="csrf-token" />            
                

Once again we'll resort to regex. By co-incidence, or not, the arrangement of these parameters has switched so that's worth keeping an eye out for!


if (preg_match('/<meta content="(.*)" name="csrf-token" \/>/i'$data$token)) {
    
$this->_token $token[1];
    return 
true;
}
                

As noted after the login call we grab the token to avoid excess HTTP requests. Now let's wrap it into something usable

               
public function doRequest($method$function$parameters) {
    
$this->ch curl_init();        
    
$url = (!filter_var($functionFILTER_VALIDATE_URL) ? $this->store '') . $function;
    
    switch (
$method) {
        case 
'POST':
            
$this->setOpts([
                
CURLOPT_POST => true,
                
CURLOPT_POSTFIELDS => json_encode($parameters),
                
CURLOPT_URL => $url,
                
CURLOPT_HTTPHEADER => [
                    
'X-Shopify-Api-Features: pagination-headers',
                    
'X-CSRF-Token: ' $this->_token,
                    
'X-Requested-With: XMLHttpRequest',
                    
'Content-Type: application/json',
                    
'Accept: application/json'
                
]
            ]);

            break;
        case 
'GET':
        default:
            
$this->setOpts([
                
CURLOPT_HTTPGET => true,
                
CURLOPT_URL => $url . (count($parameters) ? '?' urldecode(http_build_query($parameters)) : '')
            ]);
    }

    
$response curl_exec($this->ch);
    
$data json_decode($response);
    
    return 
is_object($data) ? $data $response;
}
                

Sure enough, that worked too! So what other cool stuff can we do?

Spoiler: I like graphs.

You may have noticed the flashy new dashboard in Shopify 2. Fortunately, with little effort, we can access this data too!


There's a couple of things we need to take note of here, the callback (this is JSONP, we'll get to that in a moment), and the token. The token is used as authentication, and set inline in the document body.


Shopify
.set('controllers.dashboard.token'"WyIxOTg5NDg0IiwiMODowMCJd--ebf3dbfffec25186c14a163b8e13bafxxx")
                

As soon as I saw this it was pretty obvious it was a base64 string and an md5 hash, whilst this probably isn't terribly useful for us it's nice to know! Let's decode it. (Note: I snipped these to keep this store private)


["1989xxx""2013-03-07T22:34:12-08:00"]
                

So the base64 is an array containing the store ID and a timestamp. Perhaps the hash is used for performance metrics, or more likely a checksum of the array to avoid people grabbing analytics of other stores. Doesn't matter much to us, as we aren't trying to do anything malicious here.

Due to the same origin policy XHR requests to external locations (scheme, hostname and ports must be consistent). The exceptions being JSONP, and CORS. CORS is considered a better solution however in this instance Shopify is using JSONP, that's what the callback parameter is for. We'll need to strip out that callback when we parse the response.

To do so, I've defined the callback as a static fake_function and strip it out with regular string functions:


if ($reportCenter) {
    if (
strpos($response'fake_function') !== FALSE) {
        
$response substr($responsestrpos($response'{'));
        
$response substr($response0, -2);
    }
}
                

This allows us to access the report center data such as


stdClass Object
(
    [
start_date] => 2013-02-22
    
[end_date] => 2013-03-01
    
[search_terms] => Array
        (
            [
0] => stdClass Object
                
(
                    [
terms] => shopify.com
                    
[count] => 1
                    
[percentage] => 100
                
)

        )

    [
top_referrals] => Array
        (
            [
0] => stdClass Object
                
(
                    [
referrer] => www.example.com
                    
[count] => 530
                    
[percentage] => 56.025369978858
                
)

            ....
        )
)
                

Remember hackers, the full code & demo is available to fork:

GitHub

Hacking games to make them better

February 5th, 2011

The duplication of items has been a huge issue in a variety of games, ruining the economy in both official and in particular, private servers. A select few have figured out solutions to this , however the majority of which are offline solutions usually within stored procedures which wasn't too reliable since the database is only amended every 5 minutes by default in Knight Online. As such, we needed to figure out a live way to do it.

Fortunately, to communicate between the various server files a block of shared memory (or Memory Mapped File) is used. This means tapping into the data isn't all that difficult. Reversing the structure is much trickier. As such, I'll provide a near-complete structure below which will serve as a good base for this article.


#define MAX_USER_ID_SIZE        20
#define MAX_ACC_ID_SIZE         38
#define MAX_USER                1500

#define SLOT_MAX                14
#define HAVE_MAX                28      
#define ITEMCOUNT_MAX           9999
#define WAREHOUSE_MAX           196     

struct USER_ITEM_DATA
{
        
int     nNum;   
        
short   sDuration;      
        
short   sCount;         
        
__int64 nSerialNum;     
        
char cPadding[8];
};

struct INN_ITEM_DATA
{
        
int     nNum;
        
short   sDuration;
        
short   sCount;    
        
__int64 nSerialNum;     
};

struct _USER_DATA 
{
        
char    m_id[MAX_USER_ID_SIZE+1];                       
        
char    m_Accountid[MAX_ACC_ID_SIZE+1];         
        
DWORD   m_bZone;        

        
float   m_curx
        
float   m_cury;         
        
float   m_curz;                                                 

        
BYTE    m_bNation;
        
BYTE    m_bRace;
        
short   m_sClass;
        
BYTE    m_bHairColor;
        
BYTE    m_bRank;
        
BYTE    m_bTitle;
        
BYTE    m_bLevel;
        
int     m_iExp;
        
int     m_iLoyalty;
        
BYTE    m_bFace;
        
BYTE    m_bCity;
        
short   m_bKnights;
        
short   m_sClan;
        
short   m_sUnknown;
        
BYTE    m_bFame;

        
byte m_unknown2;
        
byte m_Hits;
        
byte m_Mana;
        
byte m_SP;
        
byte m_STR;
        
byte m_HP;
        
byte m_DEX;
        
byte m_INT;
        
byte m_MP;
        
byte m_Authority;
        
byte m_Points;
        
byte m_unknown3;

        
DWORD m_Gold;
        
signed short m_Bind;
        
int m_Bank;

        
char Garbage[39]; // skill/stat stuff etc
        
USER_ITEM_DATA m_sItemArray[HAVE_MAX+SLOT_MAX]; 
        
//_ITEM_DATA m_sItemArray[HAVE_MAX+SLOT_MAX]; // 42*8 bytes
        
INN_ITEM_DATA m_sWarehouseArray[WAREHOUSE_MAX]; //196*8 bytes

        
BYTE    m_bLogout;
        
BYTE    m_bWarehouse;
        
DWORD   m_dwTime;       
};
            

We will cast the block of memory to that structure so we don't have to loop, and apply lots of mathematical offsets, it generally makes the code cleaner to work with, and easier to maintain.

Some internal declarations may look like this:


#define MMF_TARGET "KNIGHT_DB"

typedef std::vector<_USER_DATA*> UserDataArray;
extern UserDataArray currentUsers;
HANDLE m_hUsersMutex;
HANDLE m_hMMFile;
charm_lpMMFile;

void myPopulateFunction()

                

So first of all, let's connect to the file and build up our local array. (You'll probably want to place this in your main, or equivalent.)
Note: I'm not going to incorporate lots of error handling etc in this guide, you will need to do that yourself.


m_hUsersMutex 
CreateMutexNULLFALSENULL ); 
m_hMMFile OpenFileMappingFILE_MAP_ALL_ACCESSTRUEMMF_TARGET);

if(
m_hMMFile == NULL)
    return;

m_lpMMFile = (char *)MapViewOfFile (m_hMMFileFILE_MAP_WRITE000);

if (!
m_lpMMFile
    return;

myPopulateFunction();
                

Okay, so this will open up the shared memory files that Knight Online uses to access the user data from all the various applications (aujard and ebenezer for example) and map it so we can access its data from our application. Next up, loading all of the users from the memory block and casting it to a nice struct which was defined above!


void myPopulateFunction
() {
    
WaitForSingleObjectm_hUsersMutexINFINITE );

    
// This is where we'll do our stuff.

    
ReleaseMutex(m_hUsersMutex);
}
                

That's a rough outline for our function, we use a mutex to prevent any of our other functions accessing the array while it's being written to which could cause memory exceptions and such. So let's load our users now.


void myPopulateFunction
() {
    
WaitForSingleObjectm_hUsersMutexINFINITE );

    
_USER_DATApUser NULL;
    
currentUsers.clear(); 

    for (
int i=0MAX_USERi++) {
        
pUser = (_USER_DATA*)(m_lpMMFile+(i*8000));
        
currentUsers.push_back(pUser);
    }

    
ReleaseMutex(m_hUsersMutex);
}

Because the struct above isn't entirely complete, when looping each user we can't use sizeof(_USER_DATA) as we ideally should - as such it's hard coded as 8000 bytes, this may need maintaining in future versions. This will build an array of pointers to each user - so all data is always accurate without you needing to re-populate. This has its merits and cons, so in some situations we need to create a static array too which we'll do next in the dupescanner example. So next, let's just take a look at iterating through the array so we can actually use it!


std
::vector<_USER_DATA*>::iterator iter;
for (
iter currentUsers.begin(); iter != currentUsers.end(); iter++) {
    
// our magic
}
            

Really simple stuff there, you can then access the properties of each user directly. Beware of the null structs (as again, these are simply pointers to a block of memory) so do a check for something like (*iter)->m_id having a positive length.

An example of applying the above code could be to find a pointer to a struct from a given username, how about this:


_USER_DATA
getStructByUser(const char *user) {
    
WaitForSingleObjectm_hUsersMutexINFINITE );

    
std::vector<_USER_DATA*>::iterator iter;
    for (
iter currentUsers.begin(); iter != currentUsers.end(); iter++) {
        if (!
_strnicmp(user, (*iter)->m_idMAX_USER_ID_SIZE)
            return *
iter
    
}

    
ReleaseMutex(m_hUsersMutex);
    return 
FALSE;
}
            

There are so many more potential applications to this code - for example a decent speed hacking detector, zone scanning for if users are stuck in the war zone, or bugs with invading and whatever, and of course the biggie, dupe scanning. Let's create a quick draft on how you may achieve something like this! You'll probably want to create it in a separate thread, so add something like this to our definitions


struct threadStruct
{
    
HANDLEhMutex;
    
UserDataArraycurUsers;
};

And finally, the code!


unsigned int __stdcall DupeScanThread
(LPVOID lp) { 
    
threadStructpStruct = (threadStruct*)lp;

    
UserDataArraypUsers = (UserDataArray*)pStruct->curUsers;
    
UserDataArray pTemp;

    
map<__int64_USER_DATA*> userMap;

    
WaitForSingleObject(pStruct->hMutexINFINITE );
    
pTemp.assign(pUsers->begin(), pUsers->end());
    
ReleaseMutex(pStruct->hMutex);


    for (
iter pTemp.begin(); iter != pTemp.end(); iter++) {
        
_USER_DATApUser = *iter;
        if (!
strlen(pUser->m_id)) continue;

        for (
int i=0i<HAVE_MAX+SLOT_MAX;i++) {
            if (
pUser->m_sItemArray[i].nSerialNum && pUser->m_sItemArray[i].nNum 0) { 
                if (!
userMap.insert(make_pair(pUser->m_sItemArray[i].nSerialNum, *pUser)).second) {
                    
_USER_DATAotherDuper userMap.find(pUser->m_sItemArray[i].nSerialNum)->second;
                    
// Do some magic
                
}
            }
        }
    }
}

You may notice that I only ran it once in the above example, you may want to add an extra parameter for bEnabled, then add a while (bEnabled) { sleep(sensible integer); pTemp.clear(), re fill, the interation, etc } to run it constantly! I haven't gone into the full depths of utilising the mapped file to send packets (for disconnecting and such) in this guide it's focussed KNIGHT_DB only, but I may cover that at some point in the future.

To initialise the thread, it'll be something like this:


threadStruct params
;
params.hMutex = &m_hUsersMutex;
params.curUsers = &currentUsers;
        
HANDLE hThread = (HANDLE)_beginthreadex(NULL0DupeScanThread, &params0NULL);

Hope you have as much fun with this as I did!

Native Executable Patching

February 5th, 2011

I was continuously asked how to apply security patches which were being released, instead of including instructions in each patch that I released I decided to put together a quick general guide. While this guide primarily focuses on Knight Online private server files, it serves as a good example and all terminology and methods will apply.

To begin, we'll start with the screen you'll be prompted with just after you open ebenezer in OllyDbg. I'm using Ollydbg because it's easier, don't use a hex editor it's just silly. Olly will translate opcodes to hex for me and will make far less mistakes! Also note, I'm using the classic version (v1.10 final) - the v2 alpha is becoming increasingly popular and sports some exciting new features and some amazing improvements. So be sure to check it out!

OllyDbg OEP Picture (Original Entry Point)

On the right, you have the first screen you'll be confronted with after loading your executable. Briefly, starting from the top left you have your 'CPU', this is the main window you'll be using which consists of a disassembly of the image from memory at the specified location, this is what translates your 'hex' into assembly and visa versa. To the right hand side you have your registers, you may think of these like your typical variables but remember they're pointers, either way these represent the values your assembly code is using.

Just like in most native languages you need to de-reference a pointer before you use the value.. This is why you may see "DWORD PTR DS:[pointer]", this is saying there's 4 bytes at this location that we want and we want an unsigned value of it. For all you fellow readers from a linux background, a DWORD is defined in the winapi as an unsigned integer.

On the bottom left we have the dump, this is usually where I'd follow the values of registers but it's multi-purpose and is also very handy for viewing a region of memory, think of this as your traditional hex editor. Finally, to the right of the dump is the stack, I wont fully explain it here since it's not necessary for this patch, generally you can use it for following parameters passed to a function and it's tempory storage. Every thread has its own stack. Now, moving onto the actual patch. Let's navigate to the area we want to patch, on the far left of the CPU you have the address you're currently viewing. The currently executing line of code is at eip (Instruction pointer, a register). We want to goto 0x00498B59 (The 0x represents hex, like &H in basic). To do this we press CTRL+G or can navigate by the menu, a window like this will appear:

OllyDbg Goto Picture

Press enter (or "ok"), and it'll take us to the code we want to modify. As you can see, it's exactly how osmanx said it would be. Now lets make the appropriate adjustments. You can see here that it's comparing the 8bit BL register to 0x11 (Rememer everything is in hex). If it's equal, it goes to the same location as if it's 0x07, This looks like it was meant to be expanded at some point. They're both going to the same location regardless, so we want to change 0x11 to goto our codecave. To do this, double click the line we want to modify which is 0x00498B5C and modify it to the code displayed. "JE 00499218"

OllyDbg Assemble Picture

Press "Assemble", if all goes well and you typed it correctly it'll modify the hex approriately for you. Now, tap enter to follow that jump to our code cave, or use CTRL+G -> "00499218" if you want to do it manually. This will take you to our code cave which is just a block of INT3's, this instruction is used to trigger a breakpoint but is also often padded between functions. It's an ideal location for us to make our patch since it isn't executed! Assemble, just like you did before our block of code.

When that's done, right click in the main CPU window, navigate to "copy to executable" and select "copy all modifications" as shown in the following image.

OllyDbg Copy To Executable Picture

Then it'll copy the memory image to a new window and allow you to save it to disk with the modifications, you can do that once again by following the navigation to "Save File" as shown to the right

OllyDbg Save To Disk Picture

Then a little window will appear asking you where to save it. I'd suggest you make a backup EVERY time, incase you make a mistake or the patch is incorrect (though i'm sure it isn't!).

That concludes this article. I hope I didn't miss anything important out. If you have any questions feel free to e-mail me I feel as I fear I may have rushed towards the end but I will amend it appropriately if anyone points out a mistake or particularly vague area.

PHP 5.3.3 x86 Vulnerability

January 13th, 2011

Recently a new vulnerability has been exposed (and patched) which targets specific platforms as a result of certain assumptions made within the zend core (specifically zend_strtod.c) when handling floating-point arithmetic.

The issue was only being manifested by gcc builds with -O2 so a recompile could fix it, reportedly -O0 fixes it but I would recommend using -mfpmath=sse which will favour a newer instruction set rather than the older deprecated x87 math instructions. A diff of the new PHP revision revealed the actual commited patch was an additional keyword in a declaration:


                double aadj
aadj1adj;
            

vs


                volatile double aadj
aadj1adj;
            

The volatile keyword instructs the compiler not to perform optimisations

Fortunately this is an x64 server and doesn't adopt the x87 fpu but it did affect my laptop. I applied a quick software patch which was something like this (Note: both must be placed at the top of your index, or executed file):


                
if (strpos(implode($_REQUEST), '2.2250738585072011') die();
            

or

 
                array_walk
($_REQUEST, function (&$x$v$k) {
                    if (
strpos($v.$k'2.2250738585072011')) unset($x); 
                });
            

The first example will stop script execution if the dangerous value is detected, the second will unset the affecting variable - with improper error handling in some situations this could cause errors, though.

Fortunately however, as previously stated a patch has been commited so if possible upgrade to PHP 5.3.5 or 5.2.17

To read more, see the Official PHP Bug Report

Flawed optimization theories

December 20th, 2010

With regards to optimizing performance, particularly with PHP people tend to talk about a very common set of archaeic 'tricks', or 'techniques' which are really of little to no practical application, and have been completely irrelevant since the early stages of PHP4 (Note: At the time of writing the current stable build is 5.3.3). I will first discuss the most common suggestions you will encounter and examine how much difference they really make, before detailing some more advanced optimization methods.


Echo vs Print. (Including concatenation vs arguments)

Most articles revolve around echo, suggesting it over print or any equivalent which is often based around string concatenation being slow, I will demonstrate the difference below:


    
print 'This ' 'is ' 'the ' 'first ' 'example.';
    echo 
'This ' 'is ' 'the ' 'second ' 'example.';
            

As you can see they're both extermely similar, and they're both language constructs which means they do not require parenthesis, though print can be treated like a function which leads me to another difference; print also returns a value (This leads us to another reason causing slightly worse performance). See below.


    martin
@martin:~$ php -"var_dump(print(''));"
    
int(1)
            

The code above demonstrates this, print returned 1 - it will always return 1. The final difference which is most commonly mentioned though often overlooked is that echo accepts multiple arguments which gets around the first issue of string concatenation, with comparison to the initial example, see below.


    
echo 'This ''is ''the ''third ''example.';
            

Okay, so now for some numbers. First, without string concatenation

echoprint
Time Taken (s)0.00670.0078
Percent (%)46.2153.79

Now, with string concatenation

echoprint
Time Taken (s)0.01420.0210
Percent (%)40.3459.66

Note: All values were obtained by taking 10 results, removing the upper and lower quartiles to remove anomalies then averaging the remainder. The load averages of the server were 0.05, 0.12 and 0.09 -- running Quad Core Intel(R) Xeon(R) CPU X3360 @ 2.83GHz with 5160476k free memory.

More on page 2, including a pretty graph!

Comparison chart of print vs echo

Take into consideration this was run over 100,000 iterations you can see the difference between the functions is negligible and isn't worth the time it'd take you to amend your existing codebase. However, I will favour code readability with regards to commas, but isn't going to solve any bottlenecks in your application or noticably increase performance, however, there is an interesting catch!

Though the web results were in favour of multiple arguments vs concatenation, during some more extensive testing I discovered something which I first believed to be an anomalie. I iterated the following 2000 times using a simple bash "for i in `seq 1 2000` ......" to obtain an average


    php 
-d implicit_flush=off -'$x=microtime(true);
          for($i=0;$i<200000;$i++) echo "im","testing","theories","\n"; 
          echo microtime(true)-$x;' 
tail -1
               

I got conclusive results that concatenation was faster which contradicts all existing articles I've found. Intrigued as to why, I began doing some research.

		line     # *  op                           fetch          ext  return  operands
		---------------------------------------------------------------------------------
			1     0  >   CONCAT                                           ~0      'a', 'b'
			1      ECHO                                                     ~0
			2    > RETURN                                                   null


		line     # *  op                           fetch          ext  return  operands
		---------------------------------------------------------------------------------
			1     0  >   ECHO                                                     'a'
			1      ECHO                                                     'b'
			2    > RETURN                                                   null
		 

As you can see, there's an extra concatination register rather than an echo but this doesn't really give much of an explanation, so I took an strace instead and it produced:


    write
(1"imtestingtheories\n"18)     = 18
               

And simply 4 write calls without concatenation, this means that the bottleneck reducing performance in CLI isn't PHP's concatenation, but the endpoint, the filesystem. his shouldn't really be the case, I'm not sure why it is - I use xterm, so perhaps someone else can enlighten me on that but that is not the goal fo this article. However, since writing to a network buffer is extremely fast, you will only see a very slight degrade in performance when using concatenation in a web environment, also it's worth considering that "CONCAT into a register" as opposed to "echo the register" uses more memory than zero-copying the strings around.

In conclusion, this "optimization" method has been rendered negligible, and redundant. In addition, many programmers originate from a perl, java(script), or C background and many other languages where print is very prominant which is another reason why print is so widely used

We will discuss more efficient optimization methods in due course. Turn over for more facts!

Previously we covered a comparison between echo and print including string concatenation vs passing multiple arguments to echo, now we will take a closer look at loops.

Often people will suggest that do { } while () is faster than other loops, and even suggest it over a foreach, while the performance difference is existent, with arrays, you should often favour the code readability of a foreach loop over the minimalistic performance gain from other loop syntax's. Nevertheless, let's evaluate the performance difference once again to obtain some facts.

[Incomplete]