Norconex JEF vs. Traditional Schedulers: Features and Review

Written by

in

Norconex JEF Tutorial: Managing Complex Development Workflows

In modern data processing, managing corporate crawlers, extractors, and data pipelines involves orchestrating multiple moving parts. When a workflow fails halfway through a massive migration or web scraping job, restarting from scratch is rarely an option.

This is where the Norconex Java Execution Framework (JEF) comes in. JEF is a lightweight Java library designed to manage, monitor, and execute complex development workflows. It provides developers with the tools to build resilient, traceable, and highly structured execution paths.

This tutorial provides a foundational guide to setting up and managing complex workflows using Norconex JEF. Understanding Norconex JEF Core Concepts

Before writing code, it is essential to understand the structural pieces that JEF uses to organize tasks:

IJob: The base interface for any task or piece of work you want to execute.

JobSuite: A central manager that groups multiple jobs, handles their lifecycle, and maintains execution state.

IJobErrorhandler: A mechanism to intercept failures, clean up resources, or trigger alerts.

SuiteStopWatch / JobStatus: Built-in utilities to track execution times, progress percentages, and current states (e.g., RUNNING, SUCCEEDED, FAILED). Step 1: Defining Your Custom Jobs

Every task in your workflow must implement the IJob interface. For this tutorial, we will create two mock jobs: one to fetch data, and another to process it.

import com.norconex.jef4.job.IJob; import com.norconex.jef4.status.IJobStatus; import com.norconex.jef4.suite.JobSuite; public class FetchDataJob implements IJob { @Override public String getId() { return “fetch-data-job”; } @Override public void execute(JobSuite suite, IJobStatus status) throws Exception { System.out.println(“Starting data fetch…”); // Simulating progress tracking for (int i = 1; i <= 4; i++) { Thread.sleep(500); status.setProgress((double) i / 4); System.out.println(“Fetching progress: ” + (i25) + “%”); } System.out.println(“Data fetch complete.”); } } Use code with caution. Next, create the processing job that depends on this data:

import com.norconex.jef4.job.IJob; import com.norconex.jef4.status.IJobStatus; import com.norconex.jef4.suite.JobSuite; public class ProcessDataJob implements IJob { @Override public String getId() { return “process-data-job”; } @Override public void execute(JobSuite suite, IJobStatus status) throws Exception { System.out.println(“Processing fetched data…”); // Core business logic goes here System.out.println(“Data processing successfully finished.”); } } Use code with caution. Step 2: Building the Workflow Structure

Complex workflows often require jobs to run sequentially, while others can run in parallel to maximize performance. JEF provides SyncJobGroup and AsyncJobGroup classes to structure these execution paths.

Below, we orchestrate our jobs sequentially using a main execution class:

import com.norconex.jef4.job.group.SyncJobGroup; import com.norconex.jef4.suite.JobSuite; import java.io.File; public class WorkflowManager { public static void main(String[] args) { // 1. Group jobs together sequentially SyncJobGroup workflowGroup = new SyncJobGroup( “main-workflow”, new FetchDataJob(), new ProcessDataJob() ); // 2. Define a directory to store execution serialization and status logs File indexDirectory = new File(“./jef-work-dir”); // 3. Create the JobSuite JobSuite suite = new JobSuite(workflowGroup, indexDirectory); // 4. Run the workflow System.out.println(“Starting JEF JobSuite…”); boolean success = suite.execute(); if (success) { System.out.println(“Workflow completed successfully!”); } else { System.err.println(“Workflow failed. Check logs for details.”); } } } Use code with caution. Step 3: Handling Failures and Stop/Resume Capabilities

One of JEF’s most powerful features is its state persistence. If a job fails during execution, JEF logs the exact point of failure within the specified index directory. Resuming Failed Workflows

When you run suite.execute() again on a previously failed suite, JEF automatically detects the serialized status files. Instead of starting over from scratch, it skips the successfully completed jobs (like FetchDataJob) and resumes exactly at the failed job (like ProcessDataJob). Adding Error Handlers

You can attach error handlers to your suites to catch exceptions globally, clean up unmanaged connections, or send webhooks to your DevOps monitoring tools:

suite.setErrorHandler((job, suiteInstance, exception) -> { System.err.println(“Critical error on job ” + job.getId() + “: ” + exception.getMessage()); // Insert notification or rollback logic here }); Use code with caution. Advanced Tip: Parallel Execution

If your pipeline handles multiple tasks that do not depend on each other (e.g., downloading images while parsing standalone log files), you can swap SyncJobGroup with AsyncJobGroup:

import com.norconex.jef4.job.group.AsyncJobGroup; AsyncJobGroup parallelGroup = new AsyncJobGroup( “parallel-downloads”, new DownloadTask(“ImageClusterA”), new DownloadTask(“ImageClusterB”) ); Use code with caution.

JEF will handle the underlying thread allocation and monitor the statuses of all parallel tasks simultaneously, blocking the suite from advancing until all threads report a successful state. Conclusion

Norconex JEF bridges the gap between chaotic multi-tier scripts and heavyweight enterprise service buses. By structuring your development pipelines into discrete IJob components and governing them with a JobSuite, you gain instant access to structural scaling, error safety nets, and step-by-step progress tracking.

To help tailor the next steps for your integration, let me know:

Will your workflow require parallel execution (AsyncJobGroup) or strictly sequential steps?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *