web-data-extractor icon indicating copy to clipboard operation
web-data-extractor copied to clipboard

Embeddable model support

Open fivesmallq opened this issue 8 years ago • 8 comments

fivesmallq avatar Aug 04 '16 06:08 fivesmallq

This will work only when Config is known at compile time, incase Config property is chosen at run-time this solution may not work. for example in case field Config in class Activity is interface or Abstract superclass and is implemented/extended by two different classes say Config1 and Config2 and implementing class(Config1 or Config2 ) is chosen at runtime based on some condition then this solution will fail.Also Config1 and Config2 can have additional properties which are not available in superclass Config then we will not be able to populate additional properties as actual class is not known at compile time.

There should be some way of mentioning actual implementing class (in this case Config1 or Config2 ), something like ...

             List<Activity> activities = Extractors.on(base5Xml)
            .split(xpath("//ProcessDefinition/activity").removeNamespace())
            .extract("name", xpath("//activity/@name"))
            .extract("type", xpath("//activity/type/text()"))
            .extract("resourceType", xpath("//activity/resourceType/text()"))
            .extract("config",**new EntityExtractor<Config>() {
            @Override
            public Config extract(String data) {
            return Extractors.on(data)
               .extract("encoding", xpath("//activity/config/encoding/text()"))
               .extract("pollInterval", xpath("//activity/config/pollInterval/text()")).asBean(Config1 .class))**
           .asBeanList(Activity.class);

Where Config is Abstract class and Config1 and Config2 extends Config as below,

public abstract class Config { // common options:

protected String encoding;

public class Config1 extends Config{

// consumer options

private String pollInterval;

private String createEvent;

private String modifyEvent;

private String deleteEvent;

private String mode;

private String sortby;

private String sortorder;

public class Config2 extends Config {

// producer options private String compressFile;

XML

In the XML there are two Activities (activity 1 & activity 2),Now Activity 1 will be assigned Config 2 and Activity 2 will be assigned Config 1 based on pd:resourceType,

<?xml version="1.0" encoding="UTF-8"?>
<pd:ProcessDefinition xmlns:pd="http://xmlns.tibco.com/bw/process/2003" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                      xmlns:pfx="http://www.tibco.com/namespaces/tnt/plugins/file">
    <pd:name>Processes/Simple Process.process</pd:name>
    <pd:startName>File Poller</pd:startName>
    <pd:startX>0</pd:startX>
    <pd:startY>0</pd:startY>
    <pd:returnBindings/>
    <pd:starter name="File Poller">
        <pd:type>com.tibco.plugin.file.FileEventSource</pd:type>
        <pd:resourceType>ae.activities.FileEventSourceResource</pd:resourceType>
        <pd:x>245</pd:x>
        <pd:y>96</pd:y>
        <config>
            <pollInterval>5</pollInterval>
            <createEvent>true</createEvent>
            <modifyEvent>true</modifyEvent>
            <deleteEvent>true</deleteEvent>
            <mode>files-and-directories</mode>
            <encoding>text</encoding>
            <sortby>File Name</sortby>
            <sortorder>descending</sortorder>
            <fileName>C:\Projects\SampleProject\Input\inputData.xml</fileName>
        </config>
        <pd:inputBindings/>
    </pd:starter>
    <pd:endName>End</pd:endName>
    <pd:endX>540</pd:endX>
    <pd:endY>97</pd:endY>
    <pd:errorSchemas/>
    <pd:processVariables/>
    <pd:targetNamespace>http://xmlns.example.com/1465977202414</pd:targetNamespace>
    <pd:activity name="activity 1">
        <pd:type>com.tibco.plugin.file.FileWriteActivity</pd:type>
        <pd:resourceType>ae.activities.FileWriteActivity</pd:resourceType>
        <pd:x>387</pd:x>
        <pd:y>104</pd:y>
        <config>
            <encoding>text</encoding>
            <compressFile>None</compressFile>
        </config>
        <pd:inputBindings>
            <pfx:WriteActivityInputTextClass>
                <fileName>
                    <xsl:value-of select="$_globalVariables/ns:GlobalVariables/GlobalVariables/OutputLocation"/>
                </fileName>
                <textContent>
                    <xsl:value-of select="$File-Poller/pfx:EventSourceOuputTextClass/fileContent/textContent"/>
                </textContent>
            </pfx:WriteActivityInputTextClass>
        </pd:inputBindings>
    </pd:activity>
  <pd:activity name="activity 2">
        <pd:type>com.tibco.plugin.file.FileEventSource</pd:type>
        <pd:resourceType>ae.activities.FileEventSourceResource</pd:resourceType>
        <pd:x>240</pd:x>
        <pd:y>90</pd:y>
        <config>
            <pollInterval>50</pollInterval>
            <createEvent>false</createEvent>
            <modifyEvent>true</modifyEvent>
            <deleteEvent>true</deleteEvent>
            <mode>files-and-directories</mode>
            <encoding>text</encoding>
            <sortby>File Name</sortby>
            <sortorder>descending</sortorder>
            <fileName>C:\Projects\SampleProject\output\outputData.xml</fileName>
        </config>
        <pd:inputBindings/>
    </pd:activity>
    <pd:transition>
        <pd:from>Output</pd:from>
        <pd:to>End</pd:to>
        <pd:lineType>Default</pd:lineType>
        <pd:lineColor>-16777216</pd:lineColor>
        <pd:conditionType>always</pd:conditionType>
    </pd:transition>
    <pd:transition>
        <pd:from>File Poller</pd:from>
        <pd:to>Output</pd:to>
        <pd:lineType>Default</pd:lineType>
        <pd:lineColor>-16777216</pd:lineColor>
        <pd:conditionType>always</pd:conditionType>
    </pd:transition>
</pd:ProcessDefinition>```


suppose i have a requirement that 
(1)Config 2 is assigned to Activity 1 if resourceType is ae.activities.FileWriteActivity 
(2)Config 1 is assigned to Activity 2 if resourceType is ae.activities.FileEventSourceResource

I would like to approach it something like this...
//Config 2 is assigned to Activity 1 as resourceType is ae.activities.FileWriteActivity 

```java
if(extract("resourceType", xpath("//activity/resourceType/text()")).asString().equals("ae.activities.FileWriteActivity"))

 then
  List<Activity> activities = Extractors.on(base5Xml)
                .split(xpath("//ProcessDefinition/activity").removeNamespace())
                .extract("name", xpath("//activity/@name"))
                .extract("type", xpath("//activity/type/text()"))
                .extract("resourceType", xpath("//activity/resourceType/text()"))
                .extract("config",**new EntityExtractor<Config>() {
                @Override
                public Config extract(String data) {
                return Extractors.on(data)
                   .extract("encoding", xpath("//activity/config/encoding/text()"))
                   .extract("compressFile", xpath("//activity/config/compressFile/text()")).asBean(Config2.class))**
               .asBeanList(Activity.class);

//Config 1 is assigned to Activity 2 as resourceType is ae.activities.FileEventSourceResource

else if(extract("resourceType", xpath("//activity/resourceType/text()")).asString().equals("ae.activities.FileEventSourceResource"))

List<Activity> activities = Extractors.on(base5Xml)
                .split(xpath("//ProcessDefinition/activity").removeNamespace())
                .extract("name", xpath("//activity/@name"))
                .extract("type", xpath("//activity/type/text()"))
                .extract("resourceType", xpath("//activity/resourceType/text()"))
                .extract("config",**new EntityExtractor<Config>() {
                @Override
                public Config extract(String data) {
                return Extractors.on(data)
                   .extract("encoding", xpath("//activity/config/encoding/text()"))
                   .extract("pollInterval", xpath("//activity/config/pollInterval/text()")).asBean(Config1.class))**
               .asBeanList(Activity.class); ```

ptyagi108 avatar Aug 05 '16 07:08 ptyagi108

@ptyagi108 maybe you can use filter to process this.

                .extract("config.pollInterval", xpath("//activity/config/pollInterval/text()"))
                       //if pollInterval is null set to default '5'
                      .filter(value -> value == null ? value : "5")
                .extract("config.compressFile", xpath("//activity/config/compressFile/text()"))

https://github.com/fivesmallq/web-data-extractor/blob/master/src/test/java/im/nll/data/extractor/ExtractorsTest.java#L513

or you can set the default value to the config field ?

fivesmallq avatar Aug 05 '16 07:08 fivesmallq

Pls check...I have updated my comments..filter may not work...

ptyagi108 avatar Aug 05 '16 08:08 ptyagi108

Please see my updated comments, Basic issue is how to handle polymorphism with this library.

ptyagi108 avatar Aug 05 '16 08:08 ptyagi108

@ptyagi108 OK, I Will think about it.

fivesmallq avatar Aug 05 '16 09:08 fivesmallq

Currentely i have made some changes to library in my local to get my case working , please suggest how can i make this implementation better.

public interface Extractor<T> {
T extract(String data);

}

im.nll.data.extractor.Extractors#extractBean

 private <T> T extractBean(String html, Class<T> clazz) {
    // only support String type
    if (clazz.equals(String.class)) {
        return (T) new String(html);
    }
    T entity = Reflect.on(clazz).create().get();
    for (Map.Entry<String, List<Extractor>> one : extractorsMap.entrySet()) {
        String name = one.getKey();
        List<Extractor> extractors = one.getValue();
        String result = html;
        for (Extractor extractor : extractors) {
            if(!(extractor.extract(result) instanceof String))
            {
                Reflect.on(entity).set(name, extractor.extract(result));
                return entity;
            }
            result =(String) extractor.extract(result);
        }
        result = filterBefore(result);
        result = filter(name, result);
        result = filterAfter(result);
        try {
            Reflect.on(entity).set(name, result);
        } catch (Exception e) {
            LOGGER.error("convert to bean error! can't set '{}' with '{}'", name, result, e);
        }
    }
    return entity;
}

How to use it.

@Test public void testToBeanListByXPath() throws Exception { List<Language> languages = Extractors.on(listHtml).split(xpath("//tr[@class='item']")) .extract("type", xpath("//td[1]/text()")) .extract("name", xpath("//td[2]/text()")) .extract("url", xpath("//td[3]/text()")) .extract("book", new Extractor<Book>() { @Override public Book extract(String data) { return Extractors.on(data) .extract("category", xpath("//td[2]/text()")) .extract("author", xpath("//td[3]/text()")) .asBean(Book.class); } }) .asBeanList(Language.class); Assert.assertNotNull(languages); Language second = languages.get(1); Assert.assertEquals(languages.size(), 3); Assert.assertEquals(second.getType(), "dynamic"); Assert.assertEquals(second.getName(), "Ruby"); Assert.assertEquals(second.getUrl(), "https://www.ruby-lang.org"); }

public class Language {
private String type;
private String name;
private String url;
private Book book;

ptyagi108 avatar Aug 06 '16 11:08 ptyagi108

@ptyagi108 maybe i should add a method called extractBean in Extractors and it will set the bean to the field?

fivesmallq avatar Aug 06 '16 11:08 fivesmallq

Yes..this would be better solution..

ptyagi108 avatar Aug 07 '16 08:08 ptyagi108