Overview of Data Integration Tools

Integration for SMBs

OneSaas is a data integration application that automates common tasks for specific business users, mostly in eCommerce and online marketing. It features a variety of integrated services and a simple interface to connect to these third party user accounts. Then, there are a few options for each type of data, mainly, whether or not to add and delete items according to changes in other systems that handle the same data type – Magento with eBay, Google Contacts with a CRM, etc. It also has an interface that displays all the data from the third party vendors in tables.

Cazoomi SyncApps is an integration cloud meant for small businesses. It features a visual studio for composing integrations, much like Enterprise solutions do, but are aimed at SMBs, both with regard to pricing and available integrations. Cazoomi’s website reports them having 10,000 customers, and a team of 13 members (at the time of writing). It appears that Cazoomi’s solution is meant more towards application vendors that want to create integrations than it is for end users themselves. Another type of tools are web page scrapers meant to ease extraction of data from websites without the use of APIs that sometimes aren’t available. The prominent tool in this respect is Dapper, and a newer contestant is import.io.

UPDATE: Since this text was written, a new service came out that scrapes websites automatically, and let’s you build apps out of the data it extracts, called Kimono

Industry Specific Integration Tools

Each industry has an array of specific tools, either meant for integrating data, or that integrate data as a byproduct of their main function (dashboards, process automation, etc.). These solutions are thus somewhat parallel in their function to data integration software, and could suffice in any scenario in which integration needs are within this specific set of third party applications. However, a common scenario among small businesses is an overdependence on one such specific tool, and an inability to make that tool work with services meant for other aspects of running a business. For instance, a business in eCommerce might utilize an application that optimizes and automates some of the aspects of selling online. Such a tool would surely connect seamlessly with any and every marketplace, comparison shopping engine, shopping cart software, payment gateway and shipping service. However, the ability to make this tool work with a newsletter management system, or a PPC ad platform, is of limited capacity.

Examples of such niche products vary widely from one industry to another. Ad management tools include Datorama, Kenshoo and IgnitionOne. In eCommerce, common tools include Channeladvisor, eSellerPro, Vendio and Auctiva. mHealth apps can be connected with Validic. Data from different analytics solutions can be pulled together with Segment.io. Social media campaigns can be managed from a single place with Hootsuite.

Enterprise Integration Tools

The Enterprise world covers a very wide array of solutions at this point in time. As part of this overview, only SaaS products meant at Enterprise customers will be examined and not hybrid and on-premise solutions. A key character of the following solutions is that they’re meant for IT departments to deploy, and not for the end users themselves.

Informatica Cloud is a high end service for Enterprise clients that offers a large set of tools for integrations and data processing. Jitterbit offers an open source platform for the creation and management of integrations, both legacy and SOA based. They make use of reusable templates for common integration needs so that developers have less work. Extol offers a design tool for building integrations between Enterprise applications. JackBe Presto is a visualization and BI tool that connects to external sources – data warehouses, spreadsheets, feeds, existing BI systems, etc., using a graphical assembly tool. SnapLogic is an Enterprise-grade solution for data integration that makes use of modular pieces of connectivity or functionality called ‘snaps’. Snaps can connect to third party services, local files or anything else, and can provide atomic bits of functionality used for ETL tasks. They have a marketplace for snaps with high end pricing. Mulesoft offers a set of products meant for data integration. Their visual editing tools are meant to ease the work of developers, not to provide a EUP solution. It offers a variety of pre-built connectors and integration apps, the ability to develop new ones, and visual tools for integrating services and for building data transformation processes. Dell Boomi is an Enterprise API integration tool that uses a flowchart interface and offers both pre-built integration types as well as the ability to create new ones or have Dell do the customization for the end users. Scribesoft offers the same standard set of features, as well as the ability to connect SaaS products with on premise software. Other solutions in this sphere include IBM Infosphere DataStage, IBM CastIron, and Talend’s Integration Suite.


*This article is part of a business plan I decided to edit into a series of blog posts. You can find the rest of the content here

How to Serve Cloudinary with HTTPS on Django

Cloudinary’s documentation states that their client side libraries can automatically detect the protocol used in a page and generate HTTP/S urls for the image tags accordingly. This, of course, doesn’t apply to server side code, for which there’s no apt documentation on this matter.

In Cloudinary’s Django library, there are three ways in which you can define your site specific parameters:

  • A dictionary called CLOUDINARY that you can set in your settings.py.
  • The CLOUDINARY_CLOUD_NAME environment variable.
  • The CLOUDINARY_URL environment variable.

Although not mentioned in the documentation, the CLOUDINARY_URL environment variable can take query paramters, and these would be added to the cloudinary object that generates the urls within the Django templates. So to make sure that all requests to Cloudinary are made with HTTPS, all you have to do is set the CLOUDINARY_URL variable with the query parameter ‘secure’ and the value ‘True’, like so:

export CLOUDINARY_URL=cloudinary://123456789012345:abcdefghijklmnop-qrstuvwxyz@abcdefgh?secure=True

Just make sure to replace the fake API key, API secret and cloud name I typed above with your own.

This approach is not so great, though. It forces us to make sure we include the extra query param if we’ll ever need to change the Cloudinary URL in the future, and because it’s not in the code, we can only have all our pages transferred securely or none at all.

An alternative would be to use the CLOUDINARY dict that we can set under settings.py, but that method has a disadvantage as well – it doesn’t play nicely with the other environment variables. In fact, Cloudinary’s Django library regards the three approaches specified above for entering site preferences as exclusionary, so if we set the CLOUDINARY dict under settings.py, the CLOUDINARY_URL will not even be looked at. Considering the fact that under platforms such as Heroku, an environment variable is the preferable route, we’re basically left with nothing but bad choices.

Luckily, the cloudinary object used to generate image links is just python code, the intantiation method of which takes keyword arguments that would be added to the same dict used for storing the site specific values from the inputs mentioned above. So to set Cloudinary to use secure URLs, all that’s required are the following two lines in your settings.py:


import cloudinary
cloudinary.config(secure=True)

That’s it! Your Cloudinary URLs will now always be loaded with HTTPS. Because this is just python code, you can alter it to use secure or unsecure URLs according to different views (just make sure to do that inside view functions and not in settings.py).

Overview of Online Databases & Business App Solutions

These platforms are meant to allow non-technical end users to define data structures, according to their specific needs, and then make use of an online database that stores data according to those definitions. They usually also allow one to build business apps, which are small productivity apps for web or mobile, meant to ease tasks such as data entry and reporting.

The challenge they all face is the technical complexities involved with data modeling, which in this case is left to non-technical end users. The traditional way of solving this is the route taken by MS Access, that is, leaving it to the end user to plan and manually build table schemas and relations. Once the user completes this process, the system can automatically create forms for data entry, based on the schemas he supplied. A more modern approach to solve the same problem goes the exact opposite path, by allowing the user to build the form itself, with a drag and drop interface, then automatically generating a matching schema behind the scenes.

In general, there was a surge of app authoring tools in ’06-’07, and some of those attempts to empower non-programmers to build tools for and by themselves were halted by the ‘08 crisis. Under these circumstances, DabbleDB was acquired by Twitter in ‘10 and shut down in ‘11, Lazybase disappeared in thin air, Teqlo got shut down, and Coghead became part of SAP’s River platform.

The main tool used today for authoring small business apps is Caspio Bridge. It allows one to create databases, web forms, and apps, all without writing any code. With regard to the UI, it works similarly to MS Excel and Access, but is online, with integration support through a SOAP API, POST requests, Datahub, a JS library and a plugin for MS office. Another common solution is Intuit’s QuickBase, which allows the same standard app creation, as well as managing data through a spreadsheet UI. They differentiate by offering a large set of database templates for common industry-specific needs. Both these tools have pricing levels meant for SMBs (but not micro businesses), meaning their plans start at $250-300/mo. Zoho Creator is a third alternative with lower pricing (for small usage, whereas a high usage would cost a lot more than what’s offered by other services). It’s an online database and business apps service, with a form creation wizard, business rules engine, customizable reports, multiple view types (including an editable grid), alerting, rebranding and styles, collaboration, backups, APIs and data exports.

Other contestants in this field would be TrackVia, LongJump, FileMaker Pro and Viravis. They all offer database and app creators meant for non-programmers, as well as custom reporting and database templates. LongJump provides an SDK meant for adding functionality to the system by code. FileMaker Pro offers the ability to build apps for tablets and mobile. Eccentex is another solution, that provides a platform on top of which developers can build productivity apps, and end users can make use of them.

A newer generation of tools is emerging for the past two years or so. Apart from Dreamface, which is aimed at IBM BPM software customers, all of these solutions are meant for micro to small businesses as well as the early adopter consumer market, and are priced accordingly. Ragic! Builder is an app authoring tool that offers a spreadsheet UI meant to allow the end user to define the applications’ form fields. It then offers reporting, full text search, a query builder, embedding results onto one’s website, importing and exporting, access control, versioning, customized scripting and an API. Knack is another solution with really easy data management, user management, and that’s customizable with an open API as well as with CSS and JS editors. SodaDB is a donationware product that offers a simple and customizable database, importing and exporting, a form builder, full text search, and the ability to work without signing in.

Another brand of solutions is that of online spreadsheets, which sit at a crossroad between accounting software and online databases. These are meant to compete with MS Excel on specific vectors, and offer features that make them more attractive for these specific target audiences. Smartsheet is a paid-only online spreadsheet, with a modern UI, mobile versions, and an emphasis on collaboration. Zoho Sheet is a MS Excel clone that mostly fits existing customers of Zoho. Glide Crunch is a desktop spreadsheet application meant for large spreadsheets (the kind that won’t fit into an online tool), and that syncs automatically with your storage on Glide’s cloud office suite. AirXcell is a web based spreadsheet tool meant for scientific and statistical heavy-load work. It uses R language syntax for formulas and functions, and comes built in with a few financial applications. Flextory is a web application that acts as a sort of administration panel for your own data. It’s looks and acts like a standard web admin panel, with item editing forms, filtering, sorting and choosing table fields to view, but the data is really the kind of data you’d use MS Excel to manage. SecureSheet is an online spreadsheet that offers collaboration and security. BinaryThumb is a spreadsheet for iPhones and iPads that contains not only numbers but any kind of media (slightly pivoted since this text was written). Sumwise is a spreadsheet solution that offers smart features like grouping, reusable cells, etc. CollateBox offers collaboration on lists based on MS Excel data. Other solutions would include Editgrid and Skysheet.

UPDATE: Since the time of writing this text in 2012, a new online database solution emerged in the scene, called Team Desk.


*This article is part of a business plan I decided to edit into a series of blog posts. You can find the rest of the content here

The mixFWD Business Plan

Following my decision to try and refrain from from startups for the time being, I’ve decided to publish parts of the 40-pages-long business plan I wrote for my last venture.

Here’s the list of posts published so far (more will come within the following few weeks):

The Fragmented ERP

The proliferation of the SaaS model as a main venue for software distribution is undisputed. For small businesses, SaaS offerings opened the potential to adopt software solutions that would have been beyond their price range before. The possibility for small businesses to adopt software solutions that were solely in the realm of enterprise players up to that time, paved the way to the proliferation of more and more web-based solutions, in areas such as project management, lead management and CRMs, marketing automation, BI and asset management, to name a few. Marc Andreessen, for one, predicts that “a whole new series of specialized SaaS apps will arrive to serve specific industry verticals.”

Altogether, the multiplicity of solutions that make up the ecosystem in which a business operates is the modern equivalent of what would previously be described as a business’s ERP system. This ERP is no longer the product of a sole manufacturer, and its different parts usually don’t connect or communicate with one another. This is what I like to call the fragmented ERP. It varies from its former counterpart in that the business now has a choice between competing vendors for each feature of the ecosystem as a whole. Its main shortcoming is the lock-in of company data in the servers of third-party service providers, and the inability to easily accomplish tasks that require intersecting data from multiple such sources.


*This article is part of a business plan I decided to edit into a series of blog posts. You can find the rest of the content here

Startups Are Bad For You

Socialization of risk

Established companies no longer outsource just the development of new products. Now, they outsource the ideation and planning stages as well. Out of a few hundred teams working from basements throughout the world at any given moment, only a few would reach good results. An established company can choose to only acquire these few, and save up on the costs of trying to innovate internally.

To the economy at large, this isn’t good at all. Established companies no longer spend enough resources on advancing innovation, because they can mitigate the risks of failure by transferring these risks to individuals. Most of these individuals lose. And so, on top of the loss in potential innovation, an important portion of the workforce is busy at work building things that would amount to nothing instead of doing productive work that produces real goods that people can use.

Seizure of public funds

In the U.S a lot of the money in hands of VCs come from people’s pension funds. In Israel, where I live, a large portion of the VC industry is funded by the government. In both cases the money is poured into two main venues – wages for highly skilled workers, and fees paid to a few platform owners – Google’s Adwords, Amazon’s Web Services, Apple’s App Store, etc. In all of these cases, money is being poured from the regular economy with its normal wages to the high tech economy with its disproportionate wages.

A built-in lack of innovation

The startups favored by VCs are those for which there’s already a precedent and a clear business model, i.e not the really innovative ideas, but small improvements to existing tools and services.


There’s an alternative to the startup approach:

  • Just build the things you think are most likely to be helpful to other people.
  • Build tools in the evenings and on weekends and holidays, and whenever you can.
  • Keep in mind that anything for which there’s a clear business model and monetization strategy has already been built. Instead, build things people would find helpful, no matter what the profitability of it might be, and no matter if it can scale.
  • Build for your own sake, instead of working for the sake of potential future income.

Selenium for Python Wrapper Code

First off, here’s the code:


@classmethod
def find(cls, query, selector='id', catch_all=False):
  if catch_all:
    finder = 'find_elements_by_'
  else:
    finder = 'find_element_by_'

  for i in xrange(30):
    try:
      elements = getattr(
        cls.driver,
        finder + selector
      )(query)
      break
    except StaleElementReferenceException:
      print 'attempting to recover \
      from StaleElementReferenceException ...'
      time.sleep(1)

  if elements:
    return elements
  else:
    raise Exception(
      'Could not solve stale reference'
    )

This code achieves several goals. First off, it can PEP8 the code a little by reducing a function call that looks something like the following:

self.driver.find_element_by_css_selector('.some-class')

into the following:

self.find('.some-class', 'css-selector')

Also, by wrapping all the different calls to Selenium’s find_element functions, we can add pieces of functionality that solve some of the inherent problems in the different browser implementations. The code we saw covers stale reference exceptions, that are very common when testing in IE. It would explicitly wait for 30 seconds, trying repeatedly to fetch the element, and breaks out of the loop only when the element is caught. A possible if type(cls.driver) == type(webdriver.Ie) could be added so that this logic would only apply to explorer tests, but this isn’t really necessary, and wouldn’t really work if the tests were migrated to Sauce, for example (where the browser type is always Remote).

Asimilar route could be followed to try and solve invisible element problems. Those occur very often in PhantomJS, due to the browser’s speed, at which some JS driven elements might not be visible on time. The solution to such a problem could be something like the following:


for i in xrange(30):
  try:
    elements = getattr(
      cls.driver,
      finder + selector
    )(query)
    break
  except StaleElementReferenceException:
    print 'caught StaleElementReferenceException'
    time.sleep(1)
  except ElementNotVisibleException:
    print 'caught ElementNotVisibleException'
    time.sleep(1)

I’ve also seen this happen on all browsers when the mouse hovers over elements that expand, thus hiding other elements while Selenium is trying to reach them. A solution to this sort of problem would be to gradually move the mouse until the popped up element is left, and the concealed elements are visible again. Here’s the sample code to illustrate this:


for i in xrange(30):
  try:
    elements = getattr(
      cls.driver,
      finder + selector
    )(query)
    break
  except StaleElementReferenceException:
    print 'caught StaleElementReferenceException'
    time.sleep(1)
  except ElementNotVisibleException:
    print 'caught ElementNotVisibleException'
    action_chain = webdriver.ActionChains(
      self.driver
    )
    action_chain.move_by_offset(50, 50).perform()
    time.sleep(1)

I hope this minor contribution to the code pool that is the blogosphere would make someone’s life a little bit easier some day.
‘Till then, be safe, and Godspeed.

Magento’s Custom Variables Anywhere

Magento’s wiki states that: “CMS content cannot simply be translated in the same way as catalog content. Duplicate pages, blocks, banners and polls must be created for each language.”

The problem is that separate CMS pages would have different URL keys. This creates several complications:

1. Having separate pages with the same content isn’t advised. It’s not so much the false belief that Google frowns upon websites that contain duplicate content (it’s only dissallowed at a disproportional scale that would indicate this is used as a grey-hat SEO trick). It’s more a matter of directing more users to the same pages so as to promote their relevance in Google’s eyes.

2. Locale-dependant url codes make it hard to link to them in other places in the store. For example, a footer link to such a url means you’ll need a locale-based footer block too.

3. If the only difference between store views are those CMS pages, then what’s the point of having to create a separate sitemap.xml file for each locale?

To address these issues in my own store, I hacked Magento’s core to handle custom variables everywhere, and not just through its WYSIWYG editor. The code below is an unintrucive version that uses Magento’s event observers to react to a page rendering and scan for Magento’s custom variables syntax.

For those of you not interested in the tenchnical details, I created a packaged extension that supplies this functionality. It’s available to download for free on my professional website.

For those interested in the technicalities, you’ll notice below that I hook into the response object instead of parsing the individual blocks’ toHtml() methods. It seems surprising because individual blocks are cached in Magento CE, whereas whole pages are not, and it would have saved some parsing time if I were to hook into a method that only runs when there’s no cache available. The thing is, I tested it using Magento’s profiler and it simply doesn’t work as expected. If anyone has any insights as to why, I’d really appreciate a comment.

Either way, to get this working on your own store, create the XML file for the extension here:

ROOT/app/etc/modules/Namespace_Module.xml


<?xml version="1.0"?>
<config>
  <modules>
    <Namespace_Module>
      <active>true</active>
      <codePool>local</codePool>
    </Namespace_Module>
  </modules>
</config> 

Then create the XML file for your extension’s configurations, in which we’ll register our event observer, here:

ROOT/app/code/local/Namespace/Module/etc/config.xml


<?xml version="1.0"?>
<config>
  <modules>
    <Namespace_Module>
      <version>0.1.0</version>
    </Namespace_Module>
  </modules> 
  <global>
    <helpers>
      <variables>
        <class>Namespace_Module_Helper</class>
      </variables>
    </helpers>
    <events>
      <controller_front_send_response_before>
        <observers>
          <namespace_module>
            <type>singleton</type>
            <class>
              Namespace_Module_Helper_Observer
            </class>
            <method>parseCustomVars</method>
          </namespace_module>
        </observers>
      </controller_front_send_response_before>  
    </events>
  </global>
</config>

Last but not least, create the actual observer function inside the helper class, here:

ROOT/app/code/local/Namespace/Module/Helper/Observer.php


<?php
class Namespace_Module_Helper_Observer extends Mage_Core_Helper_Abstract
{
  public function parseCustomVars($observer)
  {
    $response = $observer->getEvent()->getFront()->getResponse();
    $html = $response->getBody();

    $callback = function($matches) {
      $var = Mage::getModel('core/variable');
      $var->setStoreId(Mage::app()->getStore()->getId());
      return $var->loadByCode($matches[1])->getValue('html');
    };

    if (!$this->isAdmin()) {
      $html = preg_replace_callback(
        "/{{customVar code=(.*)}}/U",
        $callback,
        $html
      );
    }
    $response->setBody($html);
    Mage::app()->setResponse($response);

    return $this;
  }

  public function isAdmin()
  {
    if(Mage::app()->getStore()->isAdmin())
    {
      return true;
    }

    if(Mage::getDesign()->getArea() == 'adminhtml')
    {
      return true;
    }

    return false;
  }
}

The isAdmin() function is copied from somewhere. I don’t remember where but I thank it’s author for the code.

This is pretty much it. With this code in place, you can type in {{customVar code=anything}} anywhere in the administration area – configuration menus, CMS page titles, category names, etc, and it’ll parse that into the corresponding value on the frontend.

Hope this helps someone. Please share and don’t forget to test this before deploying as it was only tested on the Magento CE ver. 1.7.0.0.

| Magento’s architecture requires the use of separate CMS pages, with separate URL keys for each language. In this post, I suggest a hack that enables using custom variables in the CMS’s title and content fields instead. Custom variables are translatable, and I’ve been successful at using them as a way of creating language-independent CMS pages.

Chrome is Omnimemoria

I’m in the midst of building myself a Magento website these days, and I ran into a peculiar situation. I was filling out a form in the admin panel of my local Magento installation on Chrome, when it suddenly gave me a drop-down menu with suggested items. The thing is, these suggestions were stuff I typed into a different Magento website I built more than a year ago, and on a different PC. The fact that it remembers data for so long and transfers it from one PC I use to another, can be seen as a feature… I understand that. What I can’t regard a feature is that it’s a different installation of Magento using a different URL, but it still remembered the values I typed into the same fields in the admin panel. This is far more than just an annoyance. Both admin panels are password protected and Magento admin panels sometimes hold credit card information (though it’s not a best practice). It’s crucial that nobody unauthorized would have access to this data. I repeat that these installations were sitting on different locations with different URLs. Now imagine a company that manages several Magento websites. If a certain employee only has authorization on one of those websites, he can still view values typed into fields on other installations that run from that particular browser, as long as someone is signed in with his Google account.