tirsdag den 25. oktober 2016

Handling long text in cube - need to be tested

Handling Annotation and comments in Analysis Services

By Thierry D’hers, Mosha Pasumansky
April 2005

This write-up summarizes several techniques for handling text information with the cube.  It describes the reason for these techniques, how it is applied, their pros and cons as well as limitations.

Text in an OLAP cube? Why?

While OLAP cube in general are very good and efficient at handling data, data alone isn’t self explanatory and is rarely sufficient to explain behaviors, patterns, exception such as pick or sudden drops… Many applications that require data entry from users (as opposed to loading of data from data factory) also require these same user to justify the data by association a comment to it. These applications are usually in the Finanical analytical space (Statutory consolidation, Budget, Forecasting…) or HR (Time tracking, Project tracking…).

One could say, why don’t we let OLAP handle the data and store the comments and annotation in a relational store. While this is perfectly feasible and currently is one of the main techniques used, it has some serious limitations. For one it requires a Front-end application to handle the user entry and appropriately store the Data and text in their respective storage. This is not by itself a show stopper since most of these applications require specific business logic and processes (Entry, validation, submission…) and thus need to expose a dedicated front end to the user for data entry anyway. Second and more critical is that it also requires a dedicated reporting tool that understands the application schema and is capable at query time to generate a query t the OLAP engine to retrieve the Data and another query to the relational storage to retrieve the associated Comments and join both data sets at run time. This creates a serious limitation which prevents the user community from using any third party tool to report on this data. Since this data is usually very sensitive it is necessary that comment always follows the data. Companies have then only two choices: Impose the Proprietary front-end to every single user or run the risk to have data being reported without its associated context.

The following document describes how to setup the cube so that it will enable the storage and reporting of text through MDX thus enabling both Data and Text to be queried through a unique query language and with a unique dataset.

Associating Comments with a member

Problem

Associating a comment with a member suggest that this comment never varies by any other dimensions.  While this may not be the most common type of annotation it is definitely possible.
One could say: “Wait a minute, how is that different from a member property?” Well technically it is not. While a member property is often seen as this characteristic of the member it also doesn’t vary over other dimensions.
Thus the technique to handle Dimension Member Comment is similar than the one about handling  Dimension member property, the only difference is the semantic meaning of this property.

Solution

  • Create a new member property for this dimension attribute
  • Name it comment or Annotation
  • Populate it using Dimension writeback “UpdateCube”
  • Query it like any other member property using the Dimension.Currentmember.Properties(“PropertyName”) MDX construct.

 <<Add example and screenshot here>>

Associating Comments with any cell

Problem

In most cases indeed, the Comment or Annotation is really associated with a cell or a set of cell.
For now let’s treat the single Cell case. This is indeed probably the most often used or desired one.
In this case we want to be able to associate a Comment with any cell in the cube (or measure group to be more precise) whether it is at leaf or non leaf level.

Solution

The solution revolve around the fact that Analysis Services Storage engine doesn’t know how to handle String for Measure data but does know for Dimension data and also the fact that the Formula engine does know how to handle text data.
So in a nutshell, Comment and Annotation are handled as Dimension members, entered by the user using Dimension writeback, then are retrieved as part of the measure group using a calculated member and thus queried from any front-end using regular MDX.

Handling Comment or Annotation as dimension member can seem scary at first, but since Yukon doesn’t handle the entire Dimension in memory anymore, its scalability capability in term of size of dimension members has greatly increased and can easily go above 50 millions members. For the very large majority of applications it seems reasonable to assume that this type of volume will never be reached. Indeed if we assume that an application has a lifespan of 5 years, then it gives us 10 millions members or Comment per year. Now let’s say that this application has 1000 active users, then every user has a workspace of 10 000 comments per year or 833 comments per month. It is a lot of text to enter every month for a single user…

Step by Step implementation of the solution

Following this step by step procedure we will add Comment support to any existing cube.
  1. Create a new Table, called “Annotation” (or anything that your wish) in the Relational backend with two columns (more can be added later, see Support for translation section): CommentKey (LongInt), Annotation String (VarChar). Do not populate these columns.
  2. In the fact table used by the measure group for which you want to add Comment, add a new column: CommentIndex (LongInt). Do not populate this column.
  3. Repeat Step 2 for as many Measure group to which you wish to Add Comments.
  4. Now, anywhere in the DSV diagram surface, click the Refresh context menu. The new Fact table columns are refreshed.
  5. Still in the DSV, select the Add/Remove Table context menu and add the Annotation table to the DSV.
  6. Now, using the Dimension wizard, create a new dimension for the Annotation table:
    • Dimension Key attribute key: CommentKey
    • Dimension Key attribute Name: Annotation String
    • Accept all the other defaults
    • Dimension Name: Annotation
  7. Now, open the Cube designer and go to the Cube Builder tab. Form the fact table create a new Measure from the newly added column CommentIndex, set it to Non Visible, set its aggregation function to NONE (This is important, we’ll see later why?).
  8. Go to the Dimension Usage tab, add the Annotation dimension to the cube using the Add Cube dimension context menu. You do not need to add any relationship with any Measure group.
  9. Go to the Calculation tab, add the Following calculated member:
    • Calculated member name: Comment
    • Calculated member MDX expression: IiF (Measures.[Comment Index] = 0,"-" ,StrToMember("Annotation.&[" + VBA!CStr(measures.[Comment Index]) + "]").Properties("Caption"))
    • This is the calculated member that you will add to your query to retrieve the comments

Now that we have the mechanics in place for retrieving the Comment for each cell, how do we get the CommentIndex in the Fact table and the Annotation String in the Dimension table.

Creating and inputting comments

Well the first easy solution is to have them loaded from the dimension table and Fact table when creating the dimension and the cube. That could mean that the Application front-end generate new records in these table for each comment that is created.
The challenge here is that the Front-end need to track the coordinate of the Cell where the comment is entered in order to appropriately create the new record.

The second solution is to use Analysis Services native Dimension and Data writeback capability to create the new Comment. This way the Application doesn’t have to know the coordinate of the cell, this is automatically generated by AS data writeback capability.
This is what the application must create in this case:
  1. User enters a Comment in a given Cell in the Comment measures
  2. Application capture this data entry when user submits
  3. Application use Dimension Writeback to create a new member in the Annotation dimension with the following values:
    1. Annotation String name: User entered text
    2. Annotation string Key: a new unique index generated by the application
  1. Application use Data Writeback to create a new data in the CommentIndex measure for the same coordinate where the text was entered:
    1. Value: = Unique MemberKey index generated for the new Annotation dimension member.
    2. Also application needs to make sure that if the index is associated with a non leaf level that in that case the writeback is done for the Datamember instead of the real parent level (otherwise the writeback will automatically allocate that value and spread it to the children which would defeat the purpose here). In this case you need to make sure that for this index measure, the measure is aggregated by replacing any parent value with any data member value if it is non null. Richard Tkachuk wrote a WP on how to do this calculation: http://www.sqlserveranalysisservices.com/OLAPPapers/Loading%20Aggregate%20Data%20in%20AS2005v2.htm

  1. Refresh the query to get the Newly entered comment back into the Comment field.


Enhancing the solution to support Comment translation

Since Annotation support relies on Dimension members, the Translation feature of Yukon can be leveraged if needed to enter or retrieve localized comments. Now it may not be realistic that every user will enter a comment as well as it localized value in many languages, but web services like Google offer online translation capabilities.
It may be a very nice service to offer comments to be automatically translated in many languages. The changes required to support it are minimal.
  1. Add a new column in the Annotation dimension for every language that need to be supported.
  2. Refresh the DSV so that these columns are seen by the DSV
  3. In the Annotation dimension designer translation tab, add the new languages then set the Annotation String attribute property as follow:
  1. Repeat for as many languages as needed.
  2. Now query with a different language. That’s it. It just works:
  3. In English:
  1. And in French…

Associating Comments with a set of Cells or Subcube

Associating Comments with a set of Cell or subcube can be slightly more problematic. There are many solution that each have different pros and cons
First solution is for the Application front end to generate as many identical values in the commentIndex as there are cells in the Subcube. This doesn’t require any changes in the solution described above, but it could lead to scalability issue, especially if the user is not aware of the concept of OLAP sparcity.
The second solution is to create a calculation that store the Commentindex value for every cell in the subcube. This can be easily done with the following expression:
Scope (<Subcube expression>)
This = <CommentIndex value>
End Scope
This doesn’t present any scalability challenges but it requires unique administration right to update the MDX script and save the cube.

Associating Comments with a grain of a few less dimensions than the data

Now let’s say that the data in our measure group varies by Account, Time, Scenarios, Organization, but we want to Comment field to not vary by the Scenario dimension. This is especially helpful if you actually want to Comment to apply to Actual, Budget and Variance altogether:

The solution to this problem is just a slight variation from the Cell level Comment solution.
  1. Instead of adding the CommentIndex column to an existing fact table, create a new fact table with only CommentIndex and the Dimension Id column that are needed (in our Case, TimeId, AccoutId, OrganizationId)
  2. Add this table to the DSV, add all the appropriate relationship with the dimension table
  3. Add CommentIndex as a new measure with its new measure group
  4. The rest of the solution works the same way…
  5. You now have a Comment field that varies by Account, Time, Organization byt not Scenarios

What about doing this in Analysis Services 2000

Well Most of it can actually work in Analysis Services 2000, a few feature though are not available in 2000 that slightly impact the solution

Solutions differences and limitations

First the NONE aggregation function doesn’t exist. The impact is that the CommentIndex values get aggregated and as a result some comment may appear in aggregated places where they shouldn’t.
For example if you have the following:
Comment 1, for January 1997 with an CommentIndex of 1
Comment 2, for February 1997 with an CommentIndex of 2
Comment 3, for March January 1997 with an CommentIndex of 3
Comment 6, for June 1997 with an CommentIndex of 6

Then when browsing Quarter1 1997 you may actually see Comment 6.
Why? Well the CommentIndex value for Q1 1997 is the SUM of its children so 6 in our case. The calculated member tries to retrieve the Comment with the CommentKey = 6.

To prevent this you can change the Calculated member to only work at leaf level. This at the same time fixes the issue but also adds a limitation to our solution. Comment can now only be entered at leaf level of the cube.

Conclusion

Managing Data and Text in the same database in Analysis Services is now very easy thanks to Yukon new NONE aggregation method, NONE Allocation writeback method and multiple measure group granularity.
All it requires is a front end that know how to writeback a dimension and a data cell at the same time.
But the benefit is immense because now any 3rd party analytic and reporting tool including Excel is now capable of query both type of data.



http://www.sqlserverdatamining.com/OLAPPapers/Handling%20Comments%20in%20Yukon.htm

Ingen kommentarer:

Send en kommentar