Wiki source code of AWS Redshift Refresh

Last modified by Martijn Woudstra on 2022/06/13 14:56

Show last authors
1 {{container}}{{container layoutStyle="columns"}}(((
2
3 Below you will find a document describing the migration path to add the AWS Redshift Refresh functionality to a data pipeline solution you have previously built.
4 If you want to implement a new data pipeline, import it from the store, guaranteeing that this functionality will be included.
5
6 Should you have any questions, please get in touch with academy@emagiz.com.
7
8 * Last update: March 1st, 2022
9 * Required reading time: 6 minutes
10
11 == 1. Prerequisites ==
12
13 * Basic knowledge of the eMagiz platform
14 * Understanding of Data pipelining concepts
15 * A existing Data pipeline solution within your eMagiz project.
16
17 == 2. Key concepts ==
18
19 * This functionality automatically refreshes the materialized view in Redshift
20 * By making eMagiz refresh it automatically, the data in your dashboards are kept up to date
21
22 == 3. Migration Path * Root Cloud to eMagiz Cloud ==
23
24 Below you will find a document describing the migration path to add the AWS Redshift Refresh functionality to a data pipeline solution you have previously built.
25 If you want to implement a new data pipeline, import it from the store, guaranteeing that this functionality will be included.
26
27 === 3.1 Remove unnecessary components ===
28
29 First, we will delete components that have become obsolete as of late. The parts you can remove from the flow are:
30
31 * support.bus-connection-plain
32 * support.bus-connection-caching
33
34 Furthermore, you could remove the following debug components as every interesting step is already monitored and can therefore be tracked without the help of the debugger:
35
36 * global channel interceptor
37 * activate.debug-bridge
38 * send.debug
39 * entry.channel.debug-queue
40 * debugBridgeChannel
41
42 === 3.2 Add new components to refresh AWS Redshift ===
43
44 AWS Redshift offers you the option to create a materialized view. A materialized view is a table that combines information from several source tables into one view.
45 To keep updating this materialized view when a source table is updated eMagiz has created functionality to refresh the materialized view.
46
47 To make sure that your existing data pipeline will function in the same way you should execute the following steps:
48
49 * Add a channel called job-execution
50 * Add a channel called dummy
51 * Add a channel called job-completed
52 * Add a channel called job-completed-delay
53 * Add a channel called log
54 * Add a job execution listener gateway and configure it as follows
55 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-aws-redshift-refresh-job-execution-listener.png]]
56 * Add a standard filter and configure it as follows (the expression checks whether the Job is already finished) -> payload.status == T(org.springframework.batch.core.BatchStatus).COMPLETED
57 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-aws-redshift-refresh-execution-status.png]]
58 * Add a standard activator and configure it as follows (the expression puts the thread to sleep for 60 seconds) -> T(java.lang.Thread).sleep(60000) ?: 'AWS Materialized view refreshed.'
59 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-aws-redshift-refresh-delay.png]]
60 * Add a JDBC outbound channel adapter and configure it as follows (query can be found below)
61 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-aws-redshift-refresh-jdbc-outbound.png]]
62 * Add a logging channel adapter and configure it as follows
63 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-aws-redshift-refresh-log.png]]
64 * Make sure that the job-completed-delay channel is wiretapped with the help of the log channel
65
66 === 3.3 Query you need for refresh ===
67
68 The following query is needed to refresh the AWS materialized view. First, make sure that the name of the materialized view is filled in as a property value.
69
70 REFRESH MATERIALIZED VIEW ${dp.jdbc.distinctive-name.tablename}\_mv;
71 COMMIT;
72
73 === 3.4 Intermediate Result ===
74
75 The result should look something like this:
76
77 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-job-dashboard-cleanup-result-part-one.png]]
78
79 === 3.5 Add listener to job ===
80
81 If you only add the listener section in the flow, you would have a solution that only partly works.
82 To make sure the complete chain of events starts working in unison, you need to tell the job which listeners need to listen to the execution of the job.
83
84 To do so, open the component called support. job.{technicalnameofmessagetype}. On the advanced tab, you need to add two listeners:
85
86 * A Gateway Listener -> Select the job execution listener that you have added in step 2
87 * A Exception Listener -> This one makes sure that when an error occurs in AWS Redshift, the job status will be Failed instead of Completed.
88
89 The configuration should look as follows:
90
91 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-aws-redshift-refresh-add-job-listeners.png]]
92
93 === 3.6 End Result ===
94
95 The result of all this should look something like this:
96
97 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-job-dashboard-cleanup-end-result.png]]
98
99 ===== Practice =====
100
101 == 4. Key takeaways ==
102
103 * This functionality automatically refreshes the materialized view in Redshift
104 * By making eMagiz refresh it automatically, the data in your dashboards are kept up to date
105
106 )))((({{toc/}}))){{/container}}{{/container}}