Written by Philip Howard (Bloor Software Analyst)
Philip’s a Bloor Software Analyst who started in the computer industry way back in 1973. He worked as a systems analyst, programmer and salesperson, as well as in marketing and product management roles, for a variety of companies including GEC Marconi, GPT, Philips Data Systems, Raytheon and NCR. More about Bloor...
What this paperis about are the technical considerations that you should bear in mind during the process of moving to, or implementing, products or solutions that are cloud based.
This is not a paper about why or if you should migrate to a cloud environment, whether for one particular application or for your entire infrastructure, or for something in-between. Nor is it a paper about which provider of cloud infrastructure, applications or services you should choose to partner with. What this paper is about are the technical considerations that you should bear in mind during the process of moving to, or implementing, products or solutions that are cloud based. Note that we are specifically concerned with technical issues rather than with any personnel, political or other concerns that may arise. In other words, we are principally concerned with “what” and “how”: what issues you need to think about and, how, in general terms, these can be resolved.
Before discussing the details involved in moving to a cloud environment it is worth making a few introductory comments on the subject itself. To begin with, there are three primary scenarios that involve cloud adoption:
Each of these scenarios has rather different, if overlapping, issues with respect to cloud deployment and in the discussions that follow we will make clear which concerns are relevant to which use cases.
Choosing a cloud provider first may mean you ignore better business solutions in terms of applications that are available to you.
One final point: which comes first, the cloud or the application/technology? In our opinion there is only one answer to this question: select the applications, databases and tools that will best suit your business and then determine where they will run. You may then need to make compromises if you want a common cloud provider across these various solutions but you at least open your company to all possibilities and, in any case, it is often reasonable to have different solutions supported on different platforms. Choosing a cloud provider first may mean that you ignore better business solutions in terms of applications that are available to you. This especially applies with respect to proprietary cloud providers (for example, Microsoft Azure) where your database choices may be limited, as opposed to those suppliers that have a more open offering.
Before discussing the issues arising from any move to the cloud in detail, it will be as well to outline them briefly, and where they apply, as not all topics will be relevant to all readers. In summary, therefore, the issues are:
We are not actually going to discuss each of these individually. For example, we will discuss points 1, 5 and 6 under the single heading of integration; and points 3 and 4 under the heading of application retirement and archiving. We will start, therefore, with point 2.
Migrating from an on-premise environment to a cloud environment can occur in different ways. It is not typical that you take an existing application, along with its data, and re-host it in the cloud. If this is what you are doing then there is not a true migration but more a question of simple data movement, with which there should be no technical issues. However, much more usually you are moving the data into the cloud where it will be running in a new application and, often, on a new database. These are true migrations and there are much the same concerns with such migrations as there are within on-premise migrations where you are moving from one database to another, or changing ERP supplier. Bloor Research has published multiple papers on this specific topic and it is not necessary to repeat all that information here. However, it is worth summarising the results of our previous research and, particularly, the best practices we have identified. These best practices, according to our research will save an average of $170,000 per project.
If you are migrating data from an existing application to a new cloud-based application where the latter has yet to be developed (as opposed to a SaaS offering) then it is important to treat the data and application elements of this project as separate entities with their own budgets. Our research suggests that migration projects are four times more likely to be aborted if there is only a single budget and time-scale. This also applies to testing of the results: in our last survey we found that 53% of projects for which there was a single testing framework ran over time or budget. That figure (at least as far as the data migration part of the project was concerned) fell to 38% when testing was done separately.
There are two reasons why you need to use data profiling tools and there are two times when you need to use them. The first reason for using a data profiling tool is that you need to ensure that your data is of good enough quality for the new environment it is to run in. From a technical point of view you need to bear in mind that your existing application may make allowances for poor quality data (for example, missing values) but your new application may simply fail to run if values are missing. There are also good business reasons why you should want to use high quality data.
The first time you should use data profiling is prior to budgeting your migration. This is because you won’t know the scope of the quality issues facing you in migrating the data prior to a preliminary profiling exercise. You are 20% more likely to bring your project in on time and budget if you adopt this approach. The second time you will use data profiling is once the project has started, in which case the process will be more detailed.
The second reason for adopting the use of data profiling is to establish the relationships that exist about the business entities you are migrating. That is, a customer (for instance) with his invoices, his orders, his sales and service history, his delivery addresses and so on. This especially applies when there are customer details in more than one place. This approach is important both to ensure that each business entity is migrated in its entirety and also to enable collaboration between business users and IT. Note that not all data profiling tools have this discovery ability.
We strongly recommend the use of tools for both data cleansing and data movement as well as data profiling and discovery. Success – by which we mean bringing in projects on budget and within predicted timescales – drops significantly when manual methods are used instead of tools.
There are two points to be made here. Firstly, you should use an established and proven methodology during your migration. Secondly, even if you are outsourcing the entire project it will help if you have some internal expertise with respect to the processes that need to be followed. Without such in-house capability projects are more likely to fail than not.
By far and away the most critical success factor, as identified by respondents to our survey, is business engagement. More people cited this as their number one criteria (let alone second and third) than anything else. There were more “number one priorities” for this than the sum of number one, two and three priorities for everything else except methodology. This is not surprising: it is business people that understand customers and products (for example) and how they inter-relate, not IT. If relevant personnel are not engaged in ensuring that these relationships are correctly maintained during the migration process then the project may fail. Not surprisingly, lack of support from business users was cited as a major cause of project overruns.
When you replace an existing in-house application with a SaaS application we can assume that you then intend to retire the pre-existing application. This is all the more the case if several applications are to be replaced during the same project. However, users are often reluctant to agree to turn off old applications that have been known and loved (a bit like an old sweater!) for many years. It is therefore important to have a formal agreement – accepted by users – as to the appropriate conditions that will enable the old application(s) to be turned off.
The other point to consider when retiring applications and moving to a SaaS environment is whether you need or want to move all of the data associated with that application to the new environment. It will often be the case that this is not necessary but that you either want to (for business reasons), or have to (for compliance reasons) retain that data in some form. In which case you may wish to archive data that is not going to be used by the new application. If this is so then you effectively have two migration projects: one to the new application and one to the archive. With respect to the migration processes involved the considerations are identical. Indeed, you may archive data to the cloud. However, there is one other consideration with respect to archiving, which is to do with access.
Just as business users are often reluctant to turn off old applications they are also often of the opinion that all data should be kept on-line and available. You may therefore need to prove that some data is dormant. You can do this by using a database activity monitoring tool or some other technology that can demonstrate how often each piece of data is accessed and when was the last it was accessed. At the same time, archiving technologies should be introduced that make it easy for business users to access archived data if they ever should need to in the future.
We do not know how many SaaS applications there are. We recently spoke to a data integration vendor that had 250 on its shortlist of most popular applications: so the total is certainly in thousands. Bearing in mind that you might want to pass data between these applications as well as to and from these applications and on-premise environments, there are literally millions of ways that this environment may need to be connected. This poses a problem for vendors of data integration tools and products and, by corollary, for users seeking integration solutions. Of course, if you simply want to want to host a database in the cloud or use one or two of the most popular SaaS applications, then you can use a traditional data integration product. However, if your environment is going to be much more complex than this, if not now then later, then you may need to think twice.
There are three possible approaches. One is to use APIs (application programming interfaces). These are sufficiently complex and numerous (bear in mind different versions of APIs to keep track of) that vendors have started to introduce API management applications in order to keep track of all of these. Alternatively, there are two different approaches that you can use for data integration. One is to use a traditional approach where specific transformation processes are built for each dataflow from source to target. The problem with this approach is that while suppliers only have to provide connectors at each end, if you have a complex environment with many endpoints then the number of transformation processes you need to build and maintain becomes too much of a burden. For example, not so long ago we talked with a company that had 38,000 transformation processes that they need to maintain just to stand still.
We do not know how many SaaS applications there are … the total is certainly in thousands.
An alternative approach, offered by some data integration vendors, is to support a canonical form whereby you convert source data to the canonical form and then you convert the canonical form to the target format. This means that you need exactly as many transformation process as there are sources and targets put together. This is more efficient in terms of numbers of processes though it may not be suitable for very complex transformations.
Finally, there is also the issue of supporting obscure and less well-known SaaS (or, indeed, on-premise) applications and databases. Here you will either need to rely on crowd-sourcing (web sites where you can share, upload and download connectors) or software developer kits that allow you to develop your own connectors or APIs.
There are multiple issues with respect to compliance, security and governance when moving to cloud environments.
From a compliance perspective there are three major issues. Firstly, there may be compliance issues with respect to the location of personally identifiable information (PII). For example, in Russia it is illegal for such information to leave the country, which means that any cloud-based application or database hosting this data will need to be within Russia. Secondly, compliance with respect to personal information applies in the same way to cloud-based information as it does to on-premise information: if you are not allowed to see certain data, then it doesn’t matter where that data is located. This means that appropriate access controls will need to be in place and/or you will need to use either static (typically for non-production environments such as development and testing) or dynamic (for production data) data masking and/or, in the latter case, you may wish to use tokenisation or encryption to safeguard your data. User specific redaction capabilities may be required if applications are document-based. Specialised compliance monitoring capabilities may also be required. Thirdly, compliance also applies to the retention of data in archives: you will need to implement information lifecycle management to ensure that data is retained for exactly as long as it needs to be and no longer. For archived data you may also need to be able to support capabilities such as legal hold and tamper proofing.
Security overlaps with compliance in a number of areas and concerns about security have been one of the major hurdles impeding the take-up of cloud. These issues are starting to be addressed but are not completely there yet. In particular, there is growing interest in so-called “encryption gateways”, which sit between the cloud and on-premise environments. In other respects, security in the cloud mirrors on-premise security in the sense that it tends to be focused on perimeter security although there are issues with respect to how you define a perimeter when it is outside your firewall and has to encompass third-party cloud environments. In this respect there is an increasing interest in deploying data-centric security solutions (alongside perimeter security), using masking as discussed above, so that even if there is a breach, the data cannot be read.
Finally, data governance applies as much to data in the cloud as it does to data on-premise and the same tools and products should be used. This is a potential source of friction because although most vendors of data governance tools support the deployment of their tools in the cloud, they do not typically support all cloud platform providers. In fact this applies to tools more widely and it should be an important consideration when selecting a cloud-based platform. Note that this should be irrelevant as far as SaaS applications are concerned.
There is one further point that we should discuss and this is that you can leverage cloud-based capabilities (whether private or public) directly within the migration process. This has a number of advantages:
Although most vendors of data governance tools support the deployment of their tools in the cloud, they do not typically support all cloud platform providers.
Note that these arguments apply even if your new application or database will be running within a traditional environment in-house. They will be even more attractive if you are migrating to a cloud-based application but there may be issues if you want to develop/migrate using one cloud provider but actually want to deploy using a different vendor. Any such linking issues will vary according to the particular companies involved and you should check this prior to making any decisions. Of course, life will be much simpler if both development and deployment use the same platform.
We do not want to put you off moving to the cloud. However, like all new things, it is easy to get carried away and easy to forget all of the ramifications of such a decision. To be sure, migrating from an on-premise CRM (customer relationship management) application to SugarCRM is relatively straightforward: it is not mission critical, from a technical point of view there are not any major data governance or quality issues, you probably don’t care about archiving, and so on. However, if you are deploying a database in the cloud then that becomes a completely different issue: now you have not just data to think about but all the applications that access the database. And if the database holds PII data then you have compliance and security to worry about as well as ongoing governance.
In practice we have only skimmed the surface of the various questions you may need to think about when moving to the cloud: we have highlighted the issues and mentioned the solutions but we have not detailed them. Hopefully, this will at least put you in the right place to ask the right questions so that your transition to the cloud can be successful.