Page Menu
Home
Phorge
Search
Configure Global Search
Log In
Files
F453456
PhabricatorNgramEngine.php
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Flag For Later
Award Token
Size
2 KB
Referenced Files
None
Subscribers
None
PhabricatorNgramEngine.php
View Options
<?php
final
class
PhabricatorNgramEngine
extends
Phobject
{
public
function
tokenizeString
(
$value
)
{
$value
=
trim
(
$value
,
' '
);
$value
=
preg_split
(
'/ +/'
,
$value
);
return
$value
;
}
public
function
getNgramsFromString
(
$value
,
$mode
)
{
$tokens
=
$this
->
tokenizeString
(
$value
);
$ngrams
=
array
();
foreach
(
$tokens
as
$token
)
{
$token
=
phutil_utf8_strtolower
(
$token
);
switch
(
$mode
)
{
case
'query'
:
break
;
case
'index'
:
$token
=
' '
.
$token
.
' '
;
break
;
case
'prefix'
:
$token
=
' '
.
$token
;
break
;
}
$token_v
=
phutil_utf8v
(
$token
);
$len
=
(
count
(
$token_v
)
-
2
);
for
(
$ii
=
0
;
$ii
<
$len
;
$ii
++)
{
$ngram
=
array_slice
(
$token_v
,
$ii
,
3
);
$ngram
=
implode
(
''
,
$ngram
);
$ngrams
[
$ngram
]
=
$ngram
;
}
}
ksort
(
$ngrams
);
return
array_keys
(
$ngrams
);
}
public
function
newTermsCorpus
(
$raw_corpus
)
{
$term_corpus
=
strtr
(
$raw_corpus
,
array
(
'!'
=>
' '
,
'"'
=>
' '
,
'#'
=>
' '
,
'$'
=>
' '
,
'%'
=>
' '
,
'&'
=>
' '
,
'('
=>
' '
,
')'
=>
' '
,
'*'
=>
' '
,
'+'
=>
' '
,
','
=>
' '
,
'-'
=>
' '
,
'/'
=>
' '
,
':'
=>
' '
,
';'
=>
' '
,
'<'
=>
' '
,
'='
=>
' '
,
'>'
=>
' '
,
'?'
=>
' '
,
'@'
=>
' '
,
'['
=>
' '
,
'
\\
'
=>
' '
,
']'
=>
' '
,
'^'
=>
' '
,
'`'
=>
' '
,
'{'
=>
' '
,
'|'
=>
' '
,
'}'
=>
' '
,
'~'
=>
' '
,
'.'
=>
' '
,
'_'
=>
' '
,
"
\n
"
=>
' '
,
"
\r
"
=>
' '
,
"
\t
"
=>
' '
,
));
// NOTE: Single quotes divide terms only if they're at a word boundary.
// In contractions, like "whom'st've", the entire word is a single term.
$term_corpus
=
preg_replace
(
'/(^| )[
\'
]+/'
,
' '
,
$term_corpus
);
$term_corpus
=
preg_replace
(
'/[
\'
]+( |$)/'
,
' '
,
$term_corpus
);
$term_corpus
=
preg_replace
(
'/
\s
+/u'
,
' '
,
$term_corpus
);
$term_corpus
=
trim
(
$term_corpus
,
' '
);
return
$term_corpus
;
}
}
File Metadata
Details
Attached
Mime Type
text/x-php
Expires
Mon, May 5, 3:03 AM (2 d)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
69028
Default Alt Text
PhabricatorNgramEngine.php (2 KB)
Attached To
Mode
rP Phorge
Attached
Detach File
Event Timeline
Log In to Comment