HomePhorge

Improve performance of Ferret engine ngram extraction, particularly for large…

Description

Improve performance of Ferret engine ngram extraction, particularly for large input strings

Summary:
See PHI87. Ref T12974. The array_slice() method of splitting the string apart can perform poorly for large input strings. I think this is mostly just the large number of calls plus building and returning an array being not entirely trivial.

We can just use substr() instead, as long as we're a little bit careful about keeping track of where we're slicing the string if it has UTF8 characters.

Test Plan:

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18649

Details

Provenance
epriestleyAuthored on Sep 26 2017, 9:16 AM
themackabuPushed on Mar 25 2025, 8:07 PM
Parents
rPa1d9a2389db4: Improve Ferret engine indexing performance for large blocks of text
Branches
Unknown
Tags
Unknown

Event Timeline