cascalog icon indicating copy to clipboard operation
cascalog copied to clipboard

??- returns only the last tuple of a sequence

Open ghost opened this issue 10 years ago • 19 comments

The following input on cascalog.playground:

(??-
   (<- [?p ?age]
       (age ?p ?age)))

returns

 [["luanne" 36] ["luanne" 36] ["luanne" 36]  ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36]  ]```

However, running 
```clojure
  (?- (stdout)
   (<- [?p ?age]
       (p/age ?p ?age)))

gives the correct result (10 unique names and ages).

ghost avatar Nov 01 '15 02:11 ghost

What cascading version and cascalog versions are you using? This reminds me of an iterator bug we fixed a while ago.

— Sent from Mailbox

On Sat, Oct 31, 2015 at 6:15 PM, Timothy Galebach [email protected] wrote:

The following input on cascalog.playground:

(??-
   (<- [?p ?age]
       (age ?p ?age)))

returns

 [["luanne" 36] ["luanne" 36] ["luanne" 36]  ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36]  ]```
However, running 
```clojure
  (?- (stdout)
   (<- [?p ?age]
       (p/age ?p ?age)))

gives the correct result (10 unique names and ages).

Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/cascalog/issues/294

sritchie avatar Nov 01 '15 02:11 sritchie

I'm using cascalog 2.1.1.

I haven't explicitly declared anything wrt cascading; I've just been following the project's readme to get started. Relevant portion of project.clj below:

  :dependencies [[org.clojure/clojure "1.7.0"]
                 [cascalog "2.1.1"]]
  :profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core "1.2.1"]]}}
  :jvm-opts ["-Xms768m" "-Xmx768m"])

ghost avatar Nov 01 '15 02:11 ghost

Yeah, this is fixed in 3.0.0-SNAPSHOT, which I think I the latest version off of master. Want to give that a shot? We're due for a new release for sure.

— Sent from Mailbox

On Sat, Oct 31, 2015 at 5:22 PM, Timothy Galebach [email protected] wrote:

I'm using cascalog 2.1.1. I haven't explicitly declared anything wrt cascading; I've just been following the project's readme to get started. Relevant portion of project.clj below:

  :dependencies [[org.clojure/clojure "1.7.0"]
                 [cascalog "2.1.1"]]
  :profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core "1.2.1"]]}}
  :jvm-opts ["-Xms768m" "-Xmx768m"])

Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/cascalog/issues/294#issuecomment-152787525

sritchie avatar Nov 01 '15 17:11 sritchie

Same issue occurs with these dependencies:

  :dependencies [[org.clojure/clojure "1.7.0"]
                 [cascalog/cascalog-core "3.0.0-SNAPSHOT"]]

Is there a working project.clj I could take a look at? Once this gets resolved I'm guessing it will come down to a documentation issue, and I'm happy to submit a pull request for that. I also had some initial frustrations because the documentation doesn't mention needing to run (bootstrap-emacs) in cider, so that should probably be fixed as well.

ghost avatar Nov 01 '15 18:11 ghost

For some reason my internet connection's preventing me from launching a repl (by blocking dependency downloads in leiningen), but I THINK, based on a different bug, I have a guess about what's causing this. Can you give this branch a try?

https://github.com/nathanmarz/cascalog/pull/295

Check out the discussion here: https://github.com/nathanmarz/cascalog/issues/251

Along with this fix: https://github.com/nathanmarz/cascalog/pull/280

for some more background on the issue. Also, Any updates on documentation you want to send over would be huge.

sritchie avatar Nov 01 '15 18:11 sritchie

Trying that branch now, trying to build it and put in the local repo, but running into the issue that the sub-modules (cascalog-checkpoint, midje, etc) depend on cascalog-core, so I'm not able to compile them initially. I don't usually structure projects like this--how do you compile this structure?

ghost avatar Nov 01 '15 21:11 ghost

Ah, sorry- first, run "lein sub install" in the base directory. Thanks for trying this out!

— Sent from Mailbox

On Sun, Nov 1, 2015 at 12:45 PM, Timothy Galebach [email protected] wrote:

Trying that branch now, trying to build it and put in the local repo, but running into the issue that the sub-modules (cascalog-checkpoint, midje, etc) depend on cascalog-core, so I'm not able to compile them initially. I don't usually structure projects like this--how do you compile this structure?

Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/cascalog/issues/294#issuecomment-152868008

sritchie avatar Nov 01 '15 22:11 sritchie

OK, that works for compilation/local repo installation. Unfortunately the bug still persists. If it's helpful, the log output in the repl says that Cascading 2.5.3 is being used currently.

Thanks for the help so far! Have a project I'm transitioning over to hadoop as it's grown a lot, and I'd really like to go with cascalog on it, so hopefully can sort this out.

ghost avatar Nov 01 '15 23:11 ghost

This looks very related to #292. The folks over at that ticket figured out that this issue only shows up with Clojure 1.7.0.

sritchie avatar Nov 06 '15 21:11 sritchie

OK, I'll try going back to 1.6, thanks!

ghost avatar Nov 06 '15 22:11 ghost

That fixed it. I'm going to submit a pull request for docs that are a bit more current in a bit.

ghost avatar Nov 07 '15 21:11 ghost

This just bit me as well; Can confirm that switching to 1.6 fixes the issue, but it would be nice to have a 1.7 compatible fix.

metasoarous avatar Nov 23 '15 22:11 metasoarous

@metasoarous totally hear you. I'm happy to review any pull requests from folks who want to take this on! I'm not using Cascalog for my work these days, so I don't have time to fix bugs like this myself, but I am available on a consulting basis to fix bugs or add features.

sritchie avatar Nov 24 '15 15:11 sritchie

Hi @sritchie: I appreciate the offer. Right now, 1.7 isn't critical for us, but if it becomes necessary we'll keep that in mind. I mostly just wanted to add a second data point for posterity's sake :-)

metasoarous avatar Nov 24 '15 17:11 metasoarous

http://dev.clojure.org/jira/browse/CLJ-1738

1.7 Compatibility Notes: iterator-seq change, it could help ?

Direction of this ticket changed at Rich's request.

Prior description capture here:

Clojure code that uses iterator-seq to wrap Java iterators that return the same mutable object on every call are broken by the chunked iterator-seq changes from CLJ-1669.

Some examples where this occurs:

Hadoop ReduceContextImpl$ValueIterator Mahout DenseVector$AllIterator/NonDefaultIterator LensKit FastIterators Cause: In 1.6, the iterator-seq wrapper could be used with these to consume a sequence over these iterators element-by-element. In 1.7 RC1, iterator-seq produces a chunked sequence. Because next() is called 32 times on the iterator before the first value can be retrieved from the seq, and the same mutable object is returned every time, code doing this now receives different (incorrect) results.

Approach: Switch iterator-seq back to non-chunked and change eduction to use the chunking iterator-seq strategy as that was the original target. Retain the use of the chunked iterator seq in sequence over the TransformerIterator.

jiyouyou125 avatar Nov 30 '15 06:11 jiyouyou125

only ??- ??<- use iteraltor-seq

jiyouyou125 avatar Nov 30 '15 08:11 jiyouyou125

@nightlord this is really interesting, and probably the reason for the bug. Looks like a change like this may work:

(defn iter-seq [iter f]
  (if (.hasNext iter)
    (lazy-seq
      (cons (f (.next iter))
            (iter-seq iter f)))))

sritchie avatar Dec 02 '15 03:12 sritchie

@sritchie it fix ??-, maybe not enough good, but sure it's problem. https://github.com/nathanmarz/cascalog/pull/296

jiyouyou125 avatar Jan 11 '16 11:01 jiyouyou125

@sritchie fix ??-, ci build problem, add profile 1.6,1.7.

build success.

jiyouyou125 avatar Jan 11 '16 15:01 jiyouyou125