Wiki source code of Job Dashboard Cleanup

Last modified by Martijn Woudstra on 2022/06/13 15:10

Hide last authors
Martijn Woudstra 32.1 1 {{container}}{{container layoutStyle="columns"}}(((
2 Below you will find a document describing the migration path to add the Job Dashboard Cleanup functionality to a data pipeline solution you have previously built.
3 If you want to implement a new data pipeline, import it from the store, guaranteeing that this functionality will be included.
4
5 Should you have any questions, please get in touch with academy@emagiz.com.
6
7 * Last update: March 1st, 2022
8 * Required reading time: 6 minutes
9
10 == 1. Prerequisites ==
11
12 * Basic knowledge of the eMagiz platform
13 * Understanding of Data pipelining concepts
14 * A existing Data pipeline solution within your eMagiz project.
15
16 == 2. Key concepts ==
17
18 * This functionality makes sure that the Job Dashboard will be available and will only show the relevant data of the last 30 days
19
20 == 3.Job Dashboard Cleanup ==
21
22 Below you will find a document describing the migration path to add the Job Dashboard Cleanup functionality to a data pipeline solution you have previously built.
23 If you want to implement a new data pipeline, import it from the store, guaranteeing that this functionality will be included.
24
25 === 3.1 Remove unnecessary components ===
26
27 First, we will delete components that have become obsolete as of late. The parts you can remove from the flow are:
28
29 * support.bus-connection-plain
30 * support.bus-connection-caching
31
32 Furthermore, you could remove the following debug components as every interesting step is already monitored and can therefore be tracked without the help of the debugger:
33
34 * global channel interceptor
35 * activate.debug-bridge
36 * send.debug
37 * entry.channel.debug-queue
38 * debugBridgeChannel
39
40 === 3.2 Add new components to cleanup the Job Dashboard ===
41
42 We have made it possible to clean up the Job Dashboard for you with the new functionality. This ensures that you can keep accessing the job info of the last month of job activity.
43
44 To make sure that your existing data pipeline will function in the same way, you should execute the following steps:
45
46 * Add a support object called top level poller and configure it as follows
47 [[image:Main.Images.Migrationpath.WebHome@migration-path-job-dashboard-cleanup--migration-path-job-dashboard-cleanup-top-level-poller-config.png]]
48 * Add a channel called clean
49 * Add a standard inbound channel adapter called clean.cron and configure it as follows (As you can see it cleans the job dashboard every day at five in the morning)
50 [[image:Main.Images.Migrationpath.WebHome@migration-path-job-dashboard-cleanup--migration-path-job-dashboard-cleanup-clean-cron-config.png]]
51 * Add a standard inbound channel adapter called startup.cron and configure it as follows (It cleans the job dashboard on startup)
52 [[image:Main.Images.Migrationpath.WebHome@migration-path-job-dashboard-cleanup--migration-path-job-dashboard-cleanup-startup-cron-config.png]]
53 * Add a JDBC outbound channel adapter to your flow
54 * Use the clean channel as input
55 * Link it to the h2 database that is in your flow
56 * Enter the query that you can find below
57 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-job-dashboard-cleanup-result-part-one.png]]
58
59 === 3.2 Query you need for cleanup ===
60
61 The following query is needed to cleanup all relevant parts of the job dashboard to ensure that only the last month's jobs are still visible.
62
63 {{code language=none}}
64 DELETE FROM BATCH_JOB_EXECUTION_CONTEXT WHERE
65 JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION WHERE DATEADD(MONTH, 1, CREATE_TIME) < CURDATE());
66 DELETE FROM BATCH_JOB_EXECUTION_PARAMS WHERE
67 JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION WHERE DATEADD(MONTH, 1, CREATE_TIME) < CURDATE());
68 DELETE FROM BATCH_STEP_EXECUTION_CONTEXT WHERE
69 STEP_EXECUTION_ID IN (SELECT STEP_EXECUTION_ID FROM BATCH_STEP_EXECUTION WHERE
70 JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION WHERE DATEADD(MONTH, 1, CREATE_TIME) < CURDATE()));
71 DELETE FROM BATCH_STEP_EXECUTION WHERE
72 JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION WHERE DATEADD(MONTH, 1, CREATE_TIME) < CURDATE());
73 DELETE FROM BATCH_JOB_EXECUTION WHERE DATEADD(MONTH, 1, CREATE_TIME) < CURDATE();
74 DELETE FROM BATCH_JOB_INSTANCE WHERE
75 JOB_INSTANCE_ID NOT IN (SELECT JOB_INSTANCE_ID FROM BATCH_JOB_EXECUTION);
76 {{/code}}
77
78 === 3.3 Result ===
79
80 The result should look something like this:
81
82 [[image:Main.Images.Migrationpath.WebHome@migration-path-aws-redshift-refresh--migration-path-job-dashboard-cleanup-result-part-one.png]]
83
84 [[image:Main.Images.Migrationpath.WebHome@migration-path-job-dashboard-cleanup--migration-path-job-dashboard-cleanup-result.png]]
85
86 == 4. Key takeaways ==
87
88 * This functionality makes sure that the Job Dashboard will be available and will only show the relevant data of the last 30 days
89
90 )))((({{toc/}}))){{/container}}{{/container}}