ProductPromotion
Logo

Elixir

made by https://0x3d.site

GitHub - abitdodgy/gibran: Gibran is an Elixir natural language processor, and a port of WordsCounted.
Gibran is an Elixir natural language processor, and a port of WordsCounted. - abitdodgy/gibran
Visit Site

GitHub - abitdodgy/gibran: Gibran is an Elixir natural language processor, and a port of WordsCounted.

GitHub - abitdodgy/gibran: Gibran is an Elixir natural language processor, and a port of WordsCounted.

Gibran

Yesterday is but today's memory, and tomorrow is today's dream.

Gibran

Gibran is an Elixir natural language processor. Lofty goals for Gibran include:

  • Metaphone phonetic coding system
  • Soundex algorithm
  • Porter Stemming algorithm
  • String similarity as described by Simon White

Currently, Gibran ships with the following features:

  • Token count, unique token count, and character count
  • Average characters per token
  • HashDicts of tokens and their frequencies, lengths, and densities
  • The longest token(s) and its length
  • The most frequent token(s) and its frequency
  • Unique tokens
  • Levenshtein distance algorithm

Usage

Let's start with something simple.

alias Gibran.Tokeniser
alias Gibran.Counter

str = "Yesterday is but today's memory, and tomorrow is today's dream."
Tokeniser.tokenise(str)
# => ["yesterday", "is", "but", "today's", "memory", "and", "tomorrow", "is", "today's", "dream"]

Tokeniser.tokenise(str) |> Counter.uniq_token_count
# => 8

By default Gibran uses the following regular expression to tokenise strings: ~r/[^\p{L}'-]/u. You can provide your own regular expression through the pattern option. You can combine pattern with exclude to create sophisticated tokenisation strategies.

Tokeniser.tokenise(string, exclude: &String.length(&1) < 4) |> Counter.token_count
# => 6

The exclude option accepts a string, a function, a regular expression, or a list combining any one or more of those types.

# Using `exclude` with a function.
Tokeniser.tokenise("Kingdom of the Imagination", exclude: &(String.length(&1) < 10))
["imagination"]

# Using `exclude` with a regular expression.
Tokeniser.tokenise("Sand and Foam", exclude: ~r/and/)
["foam"]

# Using `exclude` with a string.
Tokeniser.tokenise("Eye of The Prophet", exclude: "eye of")
["the", "prophet"]

# Using `exclude` with a list of a combination of types.
Tokeniser.tokenise("Eye of The Prophet", exclude: ["eye", &(String.ends_with?(&1, "he")), ~r/of/])
["prophet"]

Gibran provides a shortcut for working with strings directly (instead of running them through the tokeniser first).

Gibran.from_string(str, :token_count, opts: [exclude: &String.length(&1) < 4])
# => 6

To avoid inconsistencies that arise from character-casing, Gibran normalises input before applying transformations.

Levenshtein distance

Ordinary use:

iex(1)> Gibran.Levenshtein.distance("kitten", "sitting")
3

The Levenshtein distance for the same string is 0.

iex(2)> Gibran.Levenshtein.distance("snail", "snail")
0

The Levenshtein distance is case-sensitive.

iex(3)> Gibran.Levenshtein.distance("HOUSEBOAT", "houseboat")
9

The function can accept charlists as well as strings.

 iex(4)> Gibran.Levenshtein.distance('jogging', 'logger')
 4

The doctests contain extensive usage examples. Please take a look there for more details.

Articles
to learn more about the elixir concepts.

Resources
which are currently available to browse on.

mail [email protected] to add your project or resources here 🔥.

FAQ's
to know more about the topic.

mail [email protected] to add your project or resources here 🔥.

Queries
or most google FAQ's about Elixir.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory