Ori Pekelman's blog

Today I am going through whereever we are using the twitter api just a last minute verification we are basic auth free (and btw this basically just a test that all is fine).

Long live oauth.

We have been chasing a bug for quite a long time that made our lives miserable, and our client, very unhappy: When submitting a large form, intermittently the post data is broken (basically only the the first parts of the submitted form data get to Rails). The bug is caused by an incorrect implementation of parsing multipart/form-data in Rack 1.0 and earlier. This is the commit that fixes the issue : http://github.com/rack/rack/commit/a9440bc752be9b3093669614c6b56bf78d592958 you can also look at the added test case to try and reproduce the error.

We are without doubt not the first ones to fall on this, but as finding the bug was very, very hard. I am posting this as a public service hoping someone down the line will have an easier time finding the cause... the solution is easy. Either upgrade Rack to at least 1.1 or patch lib/rack/utils.rb. Also updating Ruby on Rails to > 2.3.6 will solve this as it has a dependency on Rack 1.1

The symptoms can vary, and what makes this bug very very hard to find is that it needs a lot of parameters to manifest itself. You might get validation errors on you models, or missing required parameters at the controller level: or just some unexpected behavior (Not tested, but might result in corrupted uploads). Basically what happened is that when the body data of a post is longer then 16384 and a multipart boundary is found at precisely 16384 bytes... the parser will return with therefor only the first parts parsed. So first, you need to have a huge form, now imagine the form represents something in your database, and you change the data: you need your new data to make the boundary arrive at precisely the slicing point. Considering that each browser may have a different size of boundary, the same change to the same data will not reproduce the error between two different browsers. Also because this executes inside the Application Controller at Racks level, you will find that the app's logs are going to be of very little service. We had to go wireshark on this bug, and follow the post data through all the levels to find where it broke. And we got lucky: we found offending POST data while testing... So basically you might find yourself with an application where from time to time, in what seems to be a completely random way, you will find fields missing from the POST data in production... but reproducing this on test servers might prove very very hard.

Note: this bug is not Rails specific and will happen with any framework using Rack. So just to help people find this: Broken form data on Sinatra, ruby corrupted post, big form fails intermittently with rack. Uploading Files randomly fails Ruby on Rails. Missing Post Data Ruby. Ruby multipart bug. Rails multipart problem.

af83 est fier d'organiser avec Silicon Sentier le 1er WebWorkersCamp le 3 Juillet 2010 à Paris, à la Cantine.

NodeJS, NoSQL, files d'attentes, programmation asynchrone, WebSockets, Applications Distribuées, Réseaux Sociaux acentrés, générateurs de buzzwords...

Une première rencontre parisienne pour tous ceux qui aiment faire des architectures marrantes.

Qu'est-ce que le WebWorkersCamp ?

Face à l’expansion du web en temps réel et à la multiplication des webservices, les sites web tels qu’ils sont programmés aujourd’hui ont de plus en plus de mal à tenir la charge. Ce BarCamp sera l'occasion d'échanger de manière conviviale sur les technologies permettant cette montée en charge (programmation asynchrone, websockets, NoSQL, files d'attentes...).

Seront entre autres abordés :

nodeJS : Javascript côté serveur, programmation asynchrone ; différentes NoSQL DBs et leurs usages ; les websockets ; les files d'attente (AMQP ou autres) ; ...

De manière générale, ce BarCamp sera l'occasion pour tous de faire des retours d'expérience sur toutes les technologies / solutions / architectures qui nous permettent de répondre à des besoin spécifiques en terme de montée en charge, de connectivité (rester connecté à différents web services ou clients) et autres sujets connexes.

Pour plus d'infos et pour vous inscrire... http://barcamp.org/WebWorkersCamp

Abstract: When you have a problem loading an external file through https/ssl in Flex/Flash URLRequest (Flex Error #2118 or #2032) and the problem is Internet Explorer specific, try adding a Pragma: public header.

Just a short post to tell you about ShakaCssDebug. what is it?

ShakaCSS is a tiny javascript tool for viewing:

  • guides as Photoshop style
  • a grid
  • a resizable and draggable rule

These small tools make the development of web interface easier.

Its efficiency mainly comes from 2 things:

  • its deployment on any sites in 1 click, thanks to bookmarklet
  • registration of variables in a cookie that lets you navigate from one page to another keeping the markers

so thnx shakaman for this great tool

hop on to http://shakacss.af83.com/ to play with it.

To get the code go to : http://github.com/shakaman/ShakaCss/

Fail Road

I met today with a young entrepreneur, you might know his kind of company. They have a niche in a local market, a sweet business model, they are four and their business is running great, they are really good at SEO and know how to strike a partnership deal.

The site is good looking. And it does what it is supposed to do.

They built everything in-house, in PHP/MySQL and they don't really have big performance issues (500ms average response time of their scripts which they find acceptable). They know they are not supposed to do sys admin and have a hosting company that takes care of their machines. The guy writing the code is not that bad either. He doesn't really do OOP programming nor does he really know what is the practical use of the MVC pattern. But he does separate logic from design using his own templating solution, there is not a lot inline css, and the majority of the javascript code is not inline either.

So these guys are really not in that much of trouble, I can surely say I have seen much much worse.

The code to be vs. The code that is

So what do I mean when I say their code is Liability not an Asset? It means that more or less all project code is. Before you are running a live system the "code to be" represents what you need to do in order for your business to run, and make money.

Afterwards the same code, "the code that is" mostly represents the added cost to implement new features. The more code you have, the costlier it will be to add anything new. And the real bad news is that everything you add will just pile up on top of the "code that is" making every subsequent addition even costlier. Now the negative marginal utility of the existing code is not a fixed quantity : the better structured you code is, the more unit tests you have, the more cool schema-less or loose schema databases you use, the less new code is costly. This is true of course for small changes (just change the graphic design of the site), but when it goes down to structural issues (getting your site multilingual, adding user roles, changing workflows) the difference can be tremendous. And some times it means that the moment the company succeeds in its local niche -- and wants to go bigger, go international -- or resist the big players coming to the niche market it simply can't. Even if it can now raise the money to have enough development power, it will take too long. Becoming from an innovative first player, the late to market guys can be a real hard wake up call.

Now, these companies: brave, self-financed and successful can probably not exist by doing "stuff right" from the beginning. There is an initial investment in development quality that would probably not have been able to afford. You don't have that many professional developers running around looking for biz-dev guys with an idea, willing to work for a year and a half without salary, and being able to explain to their partner: hey let's not implement anything new this week, let's write some more tests, let's just refactor that old code (from 4 weeks ago). In order to hire an external professional team well you need to have the seed money, and not everybody has access to that.

Is doom a fatality?

But doom is not a fatality, there is a tipping point. There is a specific moment in time, when the company is already successful enough a time where there is still time to refactor, not that much code has been written. Although it took a year to write it a bunch of professionals can probably redo the whole thing in a month this time with the scalability, the internationalization, the security built-in and maybe the most important, the ability to add new stuff without breaking the old one, to tweak the system without being afraid it will make the whole thing collapse.

So if you are the business guy, how can you prepare yourself well to throwing away everything and still staying in business? What can you do to make sure you will not fail by succeeding? Here is a preflight checklist you should verify with your tech guy, before you write a single line of code. If you don't know you are going to be writing marvelous code. You should try to find out how to write as less code as possible and how to make it as disposable as possible.

Best practices for people who can't follow best practices

First three:

  • You are using existing open source code that enforces as many good practices as possible such as Ruby On Rails, Django or Drupal and you know the implications of the licensing model on you business model
  • Your coder is a respected open source contributor, he has a bunch of projects up on github or bitbucket and many people fork his stuff
  • On some of the comments you see on these platform people tend to be respectful for the number and the quality of tests he has written and praise his documentation

At this point there is a very good chance you can stop reading this. Things will be cool. Trust the tech guy. Just really make sure you have some kind of back up for your data, and if you see the tech guy drinking too much whisky before noon, also make sure you have access to backups of the code as well as the data, maybe on a physical support only you have access to.

You should still talk to the guy and make sure:

  • You are paying him enough money and he won't quit on you too soon

You could also verify together:

  • You are using standards, and don't invent anything but your core value
  • You respect as much as possible the spirit of the framework you use
  • You have discussed together the ambitions in terms of scalability, and he is thinking about that

The big checklist (not that big)

Normally the framework you have chosen will take care of most of the following. But if you can't use one (because you tech guy is too young/too old, he is not a respected open source contributor, he doesn't like frameworks for X/Y reason, he says doesn't have time for, etc.), you still must check the following list:

  • The source code of the application is on an SCM (source code management system)
  • You know where the SCM is. You know it is backed up. You have learnt how to get your own copy of the code or some third party you trust has
  • No development is done directly on a server
  • There are at least two environments: test and production. But really, there are four environments: the local environment of the developer(s), an unstable integration machine representing the current status of the code, a testing machine -- representing the next version to be put to production -- and the production environment.
  • You have a procedure in place to use the SCM to move code from development to production
  • If you need to put more then just code changes to production (modify database structure, dependencies on libraries/executables) you have a procedure to make these modifications
  • If you have interesting data on the server (anything that if lost might pose any kind of problem), it is backed-up
  • You have a way to verify that your backups work

Talk with the tech guy if you left any of the checkboxes unchecked. Stop, there is no point in reading this post any further. Talk with the tech guy again. Make sure all are checked, that you have a way to verify this. Make sure the procedures are written.

Next verify the following:

  • You know what your application is supposed to do, and you know how this can be tested
  • This is written in a simple text format or nice images, but both you and the developer can understand these documents

Now go on the following checklist with your developer(s):

  • The application has a single point of entry
  • Design is separate from application code
  • The application code does not talk directly to a database but uses some kind of abstraction
  • There is a single source for text strings, with every call to static text running through a translation layer
  • You have a caching infrastructure in place
  • You use something else than MySQL queries for full-text search

To earn some extra points try and check the following:

  • You only develop the features that you can make money out of and do the simplest thing that might possibly work
  • You don't develop code you are not going to immediately use
  • You understand what "Horizontal Scalability" means
  • You don't have anything in the architecture that prevents that

Every time you add a feature, every single time, go through the checklist. If you could not tick a checkbox, don't develop the feature.

On the next blog post in the series I will get into some more details on each element of the checklist, to be clear on how you verify this. The last chapter will address what happens when you've already started coding, and this article has put the fear of god into you.

This must be PHP week on AF83's github, with a bunch of stuff that we hope can help someone out there. I will post about toupti and the others later, but first:

François (francois2metz) just released session-cookie that does just that. It puts your session data in the cookie.

Usually when you are doing PHP your session data is somewhere in a local store (either on the disk as a default, in your database or in some kind of DHT like memecache).

PHP lets you set quite easily the session handler for any of these methods (look at http://www.php.net/manual/en/function.session-set-save-handler.php).

Now this does pose some serious issues in terms of scalability. If you use a database or the file system this can be very hard on your disks. Opening a session is expensive, and distributing it over a large number of servers is hard complicated and may require more code then you imagined.

Sometimes all you want to have from the session is the user_id, yet when you get the session cookie you still have to pay a round trip to the database just to get it from there.

So.. one simple, cool solution, is just to send this data to the client as a cookie. Now you don't really want the user to be able to change the data (for example changing the user id and logging-in as a super duper admin). But this is not hard.. just encrypt it with your own secret, and the user will not be able either to modify or look at it.

Basic Usage
===========

<?php
// just include session class
require_once 'session.php';
require_once 
'Crypt/Blowfish.php'// pear package, only needed when using SessionInCookie_DefaultCipher

SessionInCookie::setCipher(new SessionInCookie_DefaultCipher('mysecretkey'));

// start session normally
start_session();

// Read and write in session
$_SESSION['foo'] = 'bar';

// juste before output, call session_write_close
session_write_close();

// WARNING: now session data have been send to the client via encrypted cookie. You *CANNOT* write on $_SESSION.

echo 'Hello Word';
?>

Advanced Usage
==============

<?php
Custom cipher
-------------

class MyCipher implements SessionInCookie_Cipher
{
    public function 
encrypt($data)
    {
        return 
$data;
    }

    public function decrypt($data)
    {
        return 
$data;
    }
}

SessionInCookie::setCipher(new MyCipher());

Debug
-----

You can use SessionInCookie_DummyCipher

SessionInCookie::setCipher(new SessionInCookie_DummyCipher());

?>

I do not think this solution is great when you have too much data in the session.. but that anyway is probably a very bad idea. If the data is important ... please remember : sessions die.

get the code at: http://github.com/AF83/session-cookie

Enjoy

(btw some frameworks, written by serious people take this approach as a default...)

We are proposing a session for this years South by South West Festival.. so if you want to help us get it accepted just click on the button and buzz.... :Vote for my PanelPicker idea!
(You will need to register...)

Our turbulent future: moving people, moving data, fading logos

AF83 Platinum Sponsor For DrupalConParisWe've been working our a** off the past few months to get the next Drupal Conference in Paris (1et-5th september) as great and as cool as possible.