General Text Processing

In this section, we will focus on text processing with ISB. It is assumed that the reader is familiar with the general principles of Apache Camel.

Text Processing consists of the following steps

as indicated in the following diagram

Text Processing

The following sections describe how the corresponding tasks can be performed in Camel Routes and Spring Bean Definitions which can either be deployed via a standalone Camel Configuration or embedded in a Camel Trigger in an Stardust Process Model.

File Entry Detection

The entry of files can be performed with an Apache Camel File or FTP endpoint, e.g.

<from uri="file://c:/data?filter=#fileFilter1"/>

or

<from uri="ftp://.."/>

File Filtering

Additional file filtering can be performed via org.apache.camel.component.file.AntPathMatcherGenericFileFilter to specify files to be included in and/or excluded from further processing. Exclude take precedence over includes: If a file match both exclude and include it will be regarded as excluded.

<bean id="fileFilter1" class="org.apache.camel.component.file.AntPathMatcherGenericFileFilter">
   <property name="includes" value="M05_CMPCN_PBK*.TXT"/>
</bean>

File and Page Splitting

Once a file is received, its content can be split into text blocks using the Apache Camel Splitter Directive. Each text block obtained via split will be processed separately. For convenience splitters for line and page breaks are provided for Infinity Service Bus. Other splitting options can be obtain from Apache Camel.

Line Splitter

Split a text by lines using line break (cr) as a separator.

<bean id="lineSplitter" class="com.infinity.integration.textprocessing.splitter.LineSplitter"/>

Pagebreak Splitter

Split a text by page using page break as separator.

<bean id="lineSplitter" class="com.infinity.integration.textprocessing.splitter.PageBreakSplitter"/>

Page Filtering

Page filtering is based on the content of a particular field in the page. The filed is defined by it location (column, row, size) and filtering criteria (actual value).

 

Property Description Type
column Line offset where the text to be used for filtering starts. int
row Line where the text to be used for filtering starts. int
actualValue Value to be used for filtering. String
size Number of characters to consider; int
match y for match, n otherwise; char

File and Page Aggregation

Aggregation of pages and further processing of the page groups can be achieved with the Apache Camel Aggregator Directive. The Page Assembler bean allows group pageSize of pages together.

<bean id="pageAssemble1" class="com.infinity.integration.textprocessing.assembler.PageAssembler">
   <property name="pageSize" value="1"/>
</bean>

Data Extraction

Beans of type com.infinity.integration.textprocessing.extractor.DataExtractor allows to extract a specific data from a text block and store the extracted data in a hashmap attached to the exchange object of the route.

It either allows to address a start point for the extraction by providing (row, column) in the text block or allows to look for a pattern targetString and starts extraction from (offsetRow, offsetColumn) from the first occurrence of targetString.

In both cases up to maxCharacters characters are added to the extracted string. The extracted string is added to a map with the key dataId to be passed as data to the process started. dataType indicates the data type of the extracted metadata.

Routes may use chains of DataExtractor filters to extract multiple metadata.

Property Description Type
maxCharacters Maximum number of character to retrieve int
dataId The data Identifier where the extracted value will be stored String
column The column value in the text int
row The row (line) value in the text int
searchType The search type:
  • FIRST_PAGE_FIRST_IDENTIFIER = 's'
  • EACH_PAGE_FIRST_IDENTIFIER = 'S'
  • FIRST_PAGE_MULTIPLE_IDENTIFIERS = 'm'
  • EACH_PAGE_MULTIPLE_IDENTIFIERS = 'M'
  • IDENTIFIER_IN_FILENAME = 'F'
char
defaultValue Default value to be used in case data not found String

Example

The following example

An empty string is used as the default value. The operation is performed for every page.

<bean id="extractQTY" class="com.infinity.integration.textprocessing.extractor.DataExtractor">
   <property name="maxCharacters" value="3"/>
   <property name="dataId" value="QTY"/>
   <property name="row" value="2"/>
   <property name="column" value="11"/>
   <property name="searchType" value="S"/>
   <property name="defaultValue" value=""/>
</bean>

Input

1STSABC123XYZ12
2BC123XYZ1200STST66

Output

QTY=200

Data Extraction Strategy

The data extraction strategy define a list of data extraction details plus other properties (such as a reason code and department in the following example).

Property Description Type
status Strategy status (enum?) String
extractors Data extractor object list List(com.infinity.integration.ems.extractor.DataExtractor)
reasonCode Reason code String
department Department name String

Example

In the following example, the Instrument and Qty are extracted.

<bean id="dataextractorexample2" class="com.infinity.integration.ems.extractor.DataExtractionStrategy">
	<property name="status" value="PROCESS"/>
	<property name="extractors">
		<list>
   			<ref bean="extractorInst"/>
   			<ref bean="extractorQty"/>
   		</list>
	</property>
	<property name="reasonCode" value="RESAON"/>
	<property name="department" value="CMPCN"/>
</bean>

Further Processing

Workflow Processing

The workflow directives include authentication details and the process name to start with its required input data.

<to uri="ipp:authenticate:setCurrent?user=motu&password=motu"/>
<to uri="ipp:process:start?processId=DataExtraction&dataMap=${body}"/>

End-to-End Example

The following Camel Configuration

<route>
	<!-- File entry -->
<from uri="file://c:/data?filter=#fileFilter1"/>
<split> <!--Split the file content via page breaks -->
<method bean="pageBreakSplitter1" method="splitBody" />
<aggregate strategyRef="pageAssembler1" aggregationRepositoryRef="pagesrepociprep">
<correlationExpression>
<constant>true</constant>
</correlationExpression>
<completionPredicate>
<method ref="pageAssembler1" method="isCompleted"/>
</completionPredicate>
<filter>
<method ref="pageFilterl" method="accept"/>
<setHeader headerName="ems-it-type">
<method ref="correlationexpressionagreportname" method="evaluate"/>
</setHeader>
<aggregate strategyRef="pageAssembler1" aggregationRepositoryRef="memoryRepository1" completionTimeout="10000">
<correlationExpression>
<header>ems-it-type</header>
</correlationExpression>
<to uri="ipp:authenticate:setCurrent?user=motu&password=motu"/> <to uri="ipp:process:start?processId=EMSProcessing&Message=${body}"/> </aggregate>
</filter>
</aggregate>
</split>
</route> <!-- Beans --> <bean id="fileFilter1" class="org.apache.camel.component.file.AntPathMatcherGenericFileFilter">
<property name="includes" value="**/NAFRPT.ASH*" />
</bean> <bean id="pageBreakSplitter1" class="com.infinity.integration.ems.splitter.PageBreakSplitter"/> <bean id="pageAssembler1" class="com.infinity.integration.ems.converter.assembler.PageAssembler">
<property name="pageSize" value="2"/>
</bean> <bean id="pageFilterl" class="com.infinity.integration.textprocessing.filter.PageFilter">
<property name="column" value="1" />
<property name="actualValue" value="EWDETL" />
<property name="match" value="n" />
<property name="row" value="1" />
<property name="size" value="6" />
</bean> <bean id="memoryRepository1" class="org.apache.camel.processor.aggregate.MemoryAggregationRepository" />