[oae-dev] Solr and NRT searching/indexing

Carl Hall carl at hallwaytech.com
Mon Sep 26 07:58:01 PDT 2011


Using soft commits shouldn't affect the ability to replicate/cluster. Soft
commits do the work in memory with hard commits persisting things to disk.
If we go with a schedule strategy, we could tune this and have replication
happen every minute or so.

I'm just digging into it now but the SolrCloud[1] stuff is interesting from
a distributed/clustering perspective.

1 http://wiki.apache.org/solr/SolrCloud


On Sun, Sep 25, 2011 at 6:44 PM, N. Matthijs <
nicolaas.matthijs at caret.cam.ac.uk> wrote:

> Hi Carl,
>
> Given the risks that this would be introducing for the v1.0.1 release,
> I would be happy postponing KERN-2111 and KERN-2219 to a next
> release. Even though it would be nice to have a fix for KERN-2219
> from a user point of view, it's probably not worth it.
>
> Regarding the NRT searching/indexing solution you describe, do you
> know whether there are any consequences for when Nakamura is
> run in a cluster and the ability to have multiple solr servers?
>
> Thanks,
> Nicolaas
>
>
> > To clarify, these changes are not in the source yet. This fix (and the
> > alternative fix in sparse) require pretty significant modifications in
> the
> > source. There has already been some concern expressed in the magnitude of
> > this change in relation to the v1.0.1 release, so please bring up any
> > concerns, comments or questions.
> >
> >
> > On Fri, Sep 23, 2011 at 4:00 PM, Carl Hall <carl at hallwaytech.com> wrote:
> >
> >> With the help of Mark Triggs, I've been doing some research into
> >> near-realtime (NRT) searching and indexing with solr. Since we bind to
> >> some
> >> pretty recent snapshot builds of Solr, we are privy to the NRT
> >> work[1][2][3] that is going on to expose the functionality from Lucene.
> >> This
> >> is in response primarily to KERN-2111 and KERN-2219 but also to our long
> >> standing need to index some things faster so we can depend less on
> >> searching
> >> via sparse.
> >>
> >> My research has been in selectively adding some indexers that run
> >> outside
> >> of the batch (immediate indexers that respond per event). This uses the
> >> NRT
> >> features that solr is exposing now and let's us index certain things
> >> without
> >> waiting for the batch to run.
> >>
> >> In short, the results look good. I put together some tests[4] to flex
> >> the
> >> converted indexing handlers I'm looking at and a bit of a lab report[5].
> >> Memory usage is it a bit up but subsequent runs showed that average
> >> memory
> >> usage is roughly the same as the current 1.0 code. CPU usage is up but
> >> overall processing time is down so it appears that the use of soft
> >> commits[6] for the immediate indexers is working quite well.
> >>
> >> For v1.0.1, I have prepared 3 immediate indexers (run before batching
> >> occurs) for connections, messages and a small bit of information for
> >> content. These were chosen by finding the sparse searches that happen in
> >> the
> >> system and indexing what those searches need. My recommendation going
> >> forward is to sparingly add immediate indexers as needed until we fine
> >> tune
> >> solr. This should remove our need for any sparse searching.
> >>
> >> For 1.1, I would like to review some Solr functionality for better
> >> tuning.
> >> Namely, only do soft commits (in memory) in OAE and have hard commits
> >> (written to disk) performed by schedule (autoCommit). This will reduce
> >> IO
> >> and improve indexing & searching performance. We're currently hard
> >> commiting
> >> when under load about every 1.3s.
> >>
> >> 1 https://issues.apache.org/jira/browse/SOLR-2566
> >> 2 https://issues.apache.org/jira/browse/SOLR-2565
> >> 3 https://issues.apache.org/jira/browse/SOLR-2656
> >> 4 https://gist.github.com/1236019
> >> 5 http://home.hallwaytech.com/solr-test
> >> 6 http://wiki.apache.org/solr/NearRealtimeSearch
> >>
> > _______________________________________________
> > oae-dev mailing list
> > oae-dev at collab.sakaiproject.org
> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/oae-dev/attachments/20110926/a754480c/attachment-0001.html 


More information about the oae-dev mailing list