Using gperf with C++

16. June 2012 00:02

 

Did you know that you can use gperf with C++ to generate hash tables which can be used to make code easyier to read and also run faster. This is a short guide on how to do this for something that is quite a common problem.

 

On of these simple problems comes down to code clarity and performance. If you have ever seen large functions that spam 1000's or line of if or case statements then you will know what the problem is already. Though typically this problem is formed in client / server driven software that is processing commands on the server. It is caused because the server needs to figure out what function to run from an incomming command from the client. Quite often this end's up in code something like this.

 

if (command == "ONE") { DoOne(); }
if (command == "TWO") { DoTwo(); }
if (command == "THREE") { DoThree(); }

 

 

Typically in a large client / server project the above can grow to 1000's of commands. Quite often new commands are always added to the end of the list. The obvious problem here is that for the 1000'th command it has to evaluate 1000 if statements. This obviously isn't a great solution.

 

So here is a better way to do it. We can start with some example code that might exist on the server. eg The functions that are being run as the incoming commands are being processed. This is of course a simple example.

 

 

class Test {
public:
    static void One() { printf("One\n"); }
    static void Two() { printf("Two\n"); }
    static void Three() { printf("Three\n"); }
};

 

We also need to create a method of calling this so that the functions above can be called at design time. We can do that by creating a structure with a name (eg the command) and a pointer to the function.

 


struct TType {
    const char *name;
    void (*func) (void);
};

 

The next idea is to create a function that can lookup the TType structure in a dataset by the incoming command name. This is where gperf comes in and we can create a gperf file which will end up looking something like the following. There is a working example of a gperf file at the end of this post. Something that is also worth pointing out that in the data list between the %% and %% lines you are not permitted to use spaces as these will be considered empty strings. After all they are valid data. However you can add spacing / comments by prefixing with the #

 

 

%ignore-case
%language=C++
%define lookup-function-name Lookup
%define class-name Functions
struct TType
%%
####
ONE,    Test::One
TWO,    Test::Two
THREE,  Test::Three
####
%%

 

The above will tell gperf to ignore case when doing the string matching and it will specifiy the output to be C++. It will also create a static C++ class and function in the generate output file named Functions::Lookup along with a static structure of the TType which also contains the entire data list. As an example the C++ code that is generated is as followed. However it does also create a number of other items releated to the gperf hash calculate that is performed during the lookup. I have only included a small chunk of the file below as the rest of it really isn't human readable.

 

 

static const struct TType wordlist[] =
  {
    {""}, {""}, {""},
#line 31 "gperf-exmaple.gperf"
    {"TWO",    Test::Two},
#line 30 "gperf-exmaple.gperf"
    {"ONE",    Test::One},
#line 32 "gperf-exmaple.gperf"
    {"THREE",  Test::Three}
  };

const struct TType *
Functions::Lookup (register const char *str, register unsigned int len)
{
  if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
    {
      register int key = hash (str, len);

      if (key <= MAX_HASH_VALUE && key >= 0)
        {
          register const char *s = wordlist[key].name;

          if ((((unsigned char)*str ^ (unsigned char)*s) & ~32) == 0 && !gperf_case_strcmp (str, s))
            return &wordlist[key];
        }
    }
  return 0;
}
#line 34 "gperf-exmaple.gperf"

 

The next step in getting this to work is to make the program call the lookup function. This is normally done by creating the above file from the gperf file by running the gperf command like gperf -tCG gperf-example.gperf > gperf-example.h and then including the file into the C++ code where the lookup function will be called from. As an example you end up with a program that looks like this.

 

 

#include "gperf-example.h"


int main(int argc, char **argv) {
    const TType *tmp = Functions::Lookup("One", 3);

    if (tmp == NULL) {
        printf("FAILED\n");
    } else {
        tmp->func();
    }
}

 

The above is obviously a little easyier to maintain in the long run and runs a lot faster that trying to process 100's of if statements. It can also be integrated with the build system so that the file can be produced automatically when updates are made.

 

Here is a complete runnable example of a gperf configuration. I put this together to show how to the C++ code can be mixed into the gperf file.

 

 

%{

/* gperf -tCG gperf-exmaple.gperf > myfile.cpp */

#include <stdio.h>
#include <string.h>

struct TType {
    const char *name;
    void (*func) (void);
};


class Test {
public:
    static void One() { printf("One\n"); }
    static void Two() { printf("Two\n"); }
    static void Three() { printf("Three\n"); }
};

%}

%ignore-case
%language=C++
%define lookup-function-name Lookup
%define class-name Functions
struct TType
%%
####
ONE,    Test::One
TWO,    Test::Two
THREE,  Test::Three
####
%%


int main(int argc, char **argv) {
    const TType *tmp = Functions::Lookup("One", 3);

    if (tmp == NULL) {
        printf("FAILED\n");
    } else {
        tmp->func();
    }
}

 

The above can be processed, compile and run with the following commands.

 

 

gperf -tCG gperf-example.gperf > gperf-example.cpp
g++ -Wall gperf-example.cpp -o gperf-example
E-mail Kick it! DZone it! del.icio.us Permalink


Floating point comparisons don't work. Don't event attempt them

12. April 2012 22:23

 

This started because somebody discovered and issue with php. Which turns out to also pop up in other languages like php, javascript and of course python. These magic numbers happen to be an edge case for a double precision floating point number. So it actually happens in all languages. Simply put the number is large enough to start dropping the least significant digits.

 

One of these numbers happen to be 9223372036854775807.0

 

Here is an example of what the problem is.

 

>>> 9223372036854775807 == 9223372036854775808
False
>>> 9223372036854775807.0 == 9223372036854775808
True

 

Obviously you would think that the 2nd should also equal false. However in a floating point number they are actually converted to the same number. So of course they actually appear to be equal. We can show this by doing the following.

 

>>> a = 9223372036854775807.0
>>> b = 9223372036854775808.0
>>> a == b
True
>>> print a
9.22337203685e+18
>>> print b
9.22337203685e+18

 

As you can see the numbers are actually the same. However when you try to compare them some other ways they also break when trying to compare when forcing the type to an int like this.

 

 

>>> int(9223372036854775807) == int(9223372036854775808)
False
>>> int(9223372036854775807.0) == int(9223372036854775808)
True
>>> int(9223372036854775807.0) == 9223372036854775808
True
>>> 9223372036854775807.0 == int(9223372036854775808)
True

Like this

 

 

>>> print int(9223372036854775807)
9223372036854775807
>>> print int(9223372036854775807.0)
9223372036854775808

 

 

However this particular problem does not apply to python alone it does actually apply to anything that is using the standard ieee 64 bit floating point since it is actually impossible to represent the number 9223372036854775807.0 to it gets rounded to the nearest floating point number that happens to be 9223372036854775808

 

We can prove this because it's also acts this way in a C compiler.

 

 

#include <stdio.h>

int main(int argc, char **argv) {
        double a = 9223372036854775807.0;
        double b = 9223372036854775808.0;

        if (a == b)
                printf("True\n");
        else
                printf("False\n");

        return 0;
}

 

And if you take it down to assembler it will also show that it is happening there. However if you look at the raw data in the exe file you will also see that it has actually already truncated the number 9223372036854775807.0 to the same as the other number 9223372036854775808.0

 

Just to make it stick a little more the following is exactly the same issue!

 

>>> a = 9999999999999999.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
>>> b = 9999999999999999.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002
>>> a == b
True

 

Its floating point. Don't attempt to compare them ever to be equal to each other! It doesn't work with large numbers because there is not enough accuracy to store the information.

E-mail Kick it! DZone it! del.icio.us Permalink


Linux - Fake sshd

12. December 2011 23:00

 

I have just added another tool to my collection. Which is a fake sshd for linux. It can be used to capture login attempts. It is used for doing the following.

 

  • Profiling password attack atempts on servers.
  • Setting up a honey pot so you can invite the "kids" in
  • Stealing the dictionary's used by attackers to test against your own password hashes.

 

Here is an example of the log output from an attack.

 

 

Dec 11 14:02:04 debian fake-sshd.exe: IP: 74.53.140.146 USER: root PASS: edityahoo.no
Dec 11 14:02:06 debian fake-sshd.exe: IP: 74.53.140.146 USER: root PASS: edityahoo.org
Dec 11 14:02:07 debian fake-sshd.exe: IP: 74.53.140.146 USER: root PASS: 68b329da9893e34099c7d8ad5cb9c940
Dec 11 14:02:09 debian fake-sshd.exe: IP: 74.53.140.146 USER: root PASS: 7hur@y@t3am$#@!(*(
Dec 11 14:02:10 debian fake-sshd.exe: IP: 74.53.140.146 USER: sysgames PASS: qwertycosmin
Dec 11 14:02:12 debian fake-sshd.exe: IP: 74.53.140.146 USER: bin PASS: diana4ever
Dec 11 14:02:13 debian fake-sshd.exe: IP: 74.53.140.146 USER: bin PASS: bostanel

 

 

more information / download

 

E-mail Kick it! DZone it! del.icio.us Permalink


Old Stuff

8. December 2011 21:09

 

So I have been busy digging and moving tech recently as I switched jobs. So I went for a rummage way back to when I learnt to program which would have been in the mid 90's and found some of the following things. To my surprise almost everything I found still compiled and I have decided to dump some of the short programs I have onto my software page. Most were created for the purpose of learning at the time. All of them are written in C and will compile on debian / lenny. There will probably be more to follow as well!

 

The following have been added

 

 

More will probably follow later!

 

E-mail Kick it! DZone it! del.icio.us Permalink


Programming for a very large company can suck. Badly.

20. November 2011 18:45

 

So I moved jobs rather recently into a very large company based in the financial sector. This post is about the short period of time that I spent there from the hiring process they used until I left ten weeks later. However I picked up on a few interesting things along the way! If you are interested in working in a very large corporate / enterprise environment I strongly recommend that you read this so that you can attempt to take steps to prevent it from happening to you.

 

The Hiring process


This started off pretty simple. The normal “sort out the cv” and fire it over to the agencies to see what they had available. This went better than expected as I don’t have a degree. I ended up selecting two decent companies that are in my area. I was actually offered both jobs within a few days of each other. This happened by complete luck. So I selected the company which had the tougher interview process and with the job sounding much more interesting.

 

The Job Spec


In short the job spec was to design and develop windows form applications in c#. Expecting the various other usual parts of experience to go with it.

 

The Interview (all 5 of them!)


1.       At the start there was a pre screen test where I was emailed a short test to implement some basic functions based on an interface. This was straight forward to do and took approx 15 minutes to complete and make sure it was defiantly correct.

 

2.       I was invited to come for a complete interview at this point. So I headed off to their offices and talking and programming, software engineering and the other usual parts in an interview. I was handed a complete c# language test to do multiple guess of course which took around 45 minutes. At this point I asked a lot of questions about the job who I would be working with (the people who were interviewing me), and the other sorts of questions about their processes procedures. All the answers seem to sound very reasonable and everything about them seemed pretty normal.

 

3.       The recruitment company contacted me again and requested that I come up for another interview. It was also explained to me by the recruiter that this would be more of an HR style interview so they would have a chance to check me out properly but I had been confirmed as technically competent. Off I went again and they asked some pretty hard questions trying to pick my personality apart etc.. They spoke about their job roles and what they did for the company. They also asked if I would have had a problem moving tech from C# / Windows to Java / Linux systems. I do have a lot of previous experience with working with Linux but had never written java before. However since I have written in a number of languages I didn’t think Java would be a problem as it is actually very similar to C#.

 

4.       I was then asked to have a phone interview with more senior people. This was pretty quick took about 20-30 minutes and they actually did most of the talking about their systems and what they were doing etc..

 

5.       The 5th interview was much the same as the 4th and should have actually have happened with the 4th interview but the person was not available for some reason at short notice.

 

A few days later I was offered the job and I accepted it and eventually managed to get the paper work off in time which took an evening to read it all then another evening to fill out all the forms.

 

Let the problems begin!


The first day seemed pretty sane. I turned up had a photo taken (for some id) and some various other paper work sorted out with HR showing id and other minor things like that. I even managed to sort out a car parking card for their free car park. However I didn’t have much access to anything on their system and had to apply for it all too various places.

 

By the end of the first week I pretty much had a working workstation and access to everything. There just seemed to be one small problem.

 

·         All the systems ran on Linux.

 

·         Everything I had seen the other person working on so far was in Java.

 

This was not exactly what I had applied for. So I started investigating their systems and looking around and spent a few days trying to find some source code. I was looking around their testing systems and production systems asked a lot of questions and things seemed to improve somewhat as I could figure out exactly what they were doing. Though the point of interviewing people for a role is not just for a company to interview the employee but for the employee to interview the company. After all both parties need to be compatible. This effectively made my half of the interview a complete waste of time since I had not asked anything about their Linux systems since I was not meant to be working with them.

 

The Requirements / The actual job


This is pretty simple. Our department was responsible for taking data from the outside world from 3rd party systems doing some authentication some credit checks and some enrichment on the data rows coming in. That sounds quite complex but in reality everything was provided by another department we were effectively the “glue”. Either we called a web service to get complex functionality done of looked up some details in a data base. So our simple process looked something like this

 

  1. Get data.
  2. Map Data (eg rename some fields)
  3. Validate Data.
  4. Permissions checks.
  5. Add some information based on source.
  6. Send it on its way to the next department.

 

At this stage I was thinking. Really? We don’t really do much then?. Other than that each of these ran as a java process and there were around 50 of so different processes running. The only thing that was struck me as a challenge here was that it was a critical process. There were legal ties on the information flows and outages of longer than around 10-15 minutes would start to have a serious impact on business.

 

Something else I had also noticed is that the other members of the team (all 3 of us in total) were always debugging and finding out things that had failed on production systems and a support team that were pretty much processing the failed items by hand. By Hand? Oh that seemed ok since the flows actually only processed between around 5-150 rows each. So that’s a total of 7,500 data rows a day but the failure rates were reasonably low and arrived in batches of about 20 or so when things went wrong. I ended up getting involved in this a little so I could understand why their systems were not working and also to take the load off the other people. I should probably point out that over the 10 weeks I have been there only around 2 days have gone off without something failing. On average there seems to be about 1-3 serious failures each day. Always a different problem!

 

The Office Move


After being there for about 3-4 weeks the manager decided to move office. The other building was only down the road. The unfortunately problem with this is that I would lose my free car parking with the move and end up paying £60 / month for it. I was pissed at that since I considered it a perk for a job. £60 might not sound much but it did double my commuting costs and effectively strips £1k off the salary. Now corrected salary would have matched the other offer I had before I started. So yeah I was pretty pissed at this issue. The car park that people were using approx 500-600 people would also close as it was used for events in a complex that was located next to the office. So when they have a show on. 500 car parking spaces typically needed to be found. I find this sort of solution for a company that only moved into a building only 5 years ago not to check that there is car parking available for its staff. Not to mention this is not located in a city centre.

 

At this point I really started to look into things in a different light just to see what I could uncover about the company. Here is a list these well get from simple things I hated to things that just didn’t work at all.

 

The support Team


The support team is great. I must give them credit. The company expects them to do shift from 7am – 3pm or 11am – 9pm. They are also expected to perform manual checks and updates to production systems. They are expected to turn up on Saturday mornings and Sundays afternoons. In reality most of them actually appear to work from 7am – 5pm or 11am – 10pm and end up doing much longer than they are meant to on the weekend. I should also mention that the support guys got around 5000+ emails per day (on a good day) and were expected to deal with this with 100mb mailbox quota. Prior to leaving I did see a support team being “turned over” by a manager for missing a single email from somebody important.

There didn’t appear to be any additional benefits for people who would go above and beyond their roles. I also think the chance of promotion for them would have been extremely slim in either case.

 

 The Company Culture


Once a month there was a dress down day which you are required to donate to charity in order to dress down and also comes with a rule book so tight you are pretty much able to swap out trousers for Jeans .... So why bother? The email that went around said: No t-shirts, sweat shirts, mini skirts, shorts or anything that might be considered inappropriate.

 

At some stage during the first few weeks they attempted to place me on an “induction course” to introduce me. However they could only sit around 30-40 people each course and only ran one every 2 weeks. Eventually I was placed and ended up in what felt like an attempted corporate brain washing for 8 hours of the do and don’t, must and must not and how to work as many hours as possible. For an idea on the scale of the number of people employed on the site this was around 5-10% of the total workforce in the area across two building’s that was being turned over every month. There has to be something majorly wrong with a staff turnover rate that high.

 

The “Development” Job


I was unable to locate any information about the data mappings and processes. The only source of information was the source code. I was able to locate the source code however the svn repositories typically had multiple copies of the same code (in the same directory) and it was impossible to tell which source was last used to compile. It was of course impossible to be able to tell if any code that was running in production was checked in at all. This stuff was a complete unorganized mess.

 

I did attempt to search around to find out information about what data had to be processed but was only every met with responses of “I don’t know anything about that” or “That person has left the firm”. In effect the larger department didn’t know what data we would have been expecting to send them. Nobody anywhere could produce any sort of details information for data flow or descriptions of what the data was in any of the fields. This in effect made everything in production a trial and error process. The company called this agile software development. I called it software development by brute force.

 

There were 4-5 different generations of software which had been re-written by each person who had joined the team at some stage over the years. Each version being based on something that had been before. So a lot of software contained the same bugs as the previous version. Or the most recent 2 generations were getting pretty good and fixed all the old bugs but had created a whole pile of new bugs. None of the generations had any sort of decent error handling and used to mail bomb people when things started going wrong (eg 1 email every 10 seconds). So if the main database went down you received approx 300 emails per minute until it was corrected.

 

On the development server (the only Linux machine with a java compiler) we were forced to share a network drive as our home directory. There was 400GB which was meant to be shared between 3000 people. That works out at an average of 133mb. In order to compile code and release it a single generation it would have used around 500MB for the main package and a bunch of smaller things for a small package. However that only covers one generation. So I was using around 2 GB – 2.5 GB. Apparently I was over quota and had to get it below 500MB which was the maximum somebody could use. If I failed to get it below 500MB I would have my files deleted without warning. Hence the creation of the source code problem above.

 

For an idea of the source code there were multiple different departments attempting to do their own thing with the code so it was always being pulled in lots and lots of different directions. It was a soft code application and worked on the bases that you write xml and then configure the java (yes I have these the correct way round). This sort of application can be horrible to work with if not impossible. I should also mention that the software was using around 200-250 shared lib’s. It would have probably taken several weeks worth of work to remove them for our simple use of calling a web service. It took me around 2-3 days to get a version of the code to compile by the time that I had resolved the shared lib issues (this involved looking for the correct versions of shared lib’s)

 

All testing and debugging was performed on a “test environment” which was not permitted to have any of the standard jdk tools as it was a complete mirror of the production environment. Except the really important things like the java runtime that everything was using!

 

The shocking development process looked something like this since it was impossible to get the software to run on your workstation since it required 3rd party software to be present which required licenses. Or it was just to “Linux” dependant to get to run on windows.

 

  1. Write some code.
  2. Build software on dev server.
  3. Move software rpm to test server.
  4. Create a request and have a sys admin install the software on the test server.
  5. Run the software on the test server.
  6. Debug ....
  7. Goto step 1.

 

The above process is impossible to work with so dev’s have access to change the configuration files on the test server but are not really meant to install things. The typical workaround was to swap out jar’s. Or the more common work around was to re-config the application and write the program in java script called from java which was considered a configuration file. So this was permitted.

 

Because of other restrictions it was impossible to debug java. So all debugging was performed by using trace files or to have the program send out emails back to yourself with the information you required. The same trace files were also in use in productions. Some of which were rather large since in this situation people only ever add information and never remove it since it requires almost twice the work again. It also helps to debug issues in production. It also helped crash both the test and production environment several times by running the system out of disk space. For an idea of the scale the trace files were coming out at 1GB / day. On our production server there was no database just “processing” it had a sustained disk write speed between 2 and 8 mbytes/sec of log files depending on load.

 

So by the time I had figured out most of this the software made absolute perfect sense. It was not written to solve the problem (requirements). It was written to circumvent internal policy’s and rules so that developers could actually function and release things to production by stuffing all the code into xml and javascript files since they were considered “configuration files”. With that being said the write xml and configure java now probably makes perfect sense!

 

The only testing that was used was to try to manually simulate the real world using the “test environment”. This of course is extremely weak as we had no specific details of what was actually being processed and it was next to impossible to automate. Multiple departments had to be involved to test, verify and sign off on any changes. You can only imagine how much time this can use up considering that at times the team may have to test with other time zones which were 8 hours out but still expected to be working at a minimum or 9-5. Some automated testing did exist in the most recent generation of the software that was being used which covered about 5-10 of the components in production. However it only covered the basic logic of the code to call web services and other components. It would however never cover the “process” or what the components were actually doing in production in any way. The test results of these when looked at in details were effectively 50%-70% code coverage and 0% path coverage of what was working in production.

 

I thought I better finish this off by mentioning “timesheets”. Since almost every dev always hates time sheet’s since it normally only worth rounding most development work off to the nearest half day or day. They can make a bit of sense since we were working on multiple different projects and were working reactively and switching to the highest priority issues / task first. What didn’t make sense is the upper management demanding that everyone submit their timesheet at least one month in advance. From my point of view this just shouted and confirmed that the management is gross out of touch with what is happening further down the chain.

 

I heard a rumour that they have issues retaining their technical talent for any period of time and they cannot quite seem to pin point the issue that is causing people to leave!

 

Moving On


I was requested to attend an exit interview with HR. They had scheduled this to take around 30 minutes. I think in reality I could have talked to them for several hours on the reasons on leaving the company.

 

Since I had only been with the company for short period of time I had only had to offer them one week of notice period and to continue my duties as normal. They then effectively force me to “wait” until the last ten minutes before letting me go. During the last week I attended several meetings only to find out that there was no work for me as I was leaving. Normally when a situation develops like this the company would just place you on garden leave as it is safer and more sane for everyone involved rather than force somebody to stare blankly for the last 3-4 days of work.

 

Since I had only just been involved in recruitment I already had a pile of contacts and a phone full of numbers of people to talk to. It took me all of about 2 days to get an interview. I attended and was offered a job within about 3 days later. Apparently they pay their developers overtime there and offer flexible working hours!

E-mail Kick it! DZone it! del.icio.us Permalink