The code powering m.abunchtell.com https://m.abunchtell.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 

49 lines
938 B

  1. # frozen_string_literal: true
  2. class LanguageDetector
  3. attr_reader :text, :account
  4. def initialize(text, account = nil)
  5. @text = text
  6. @account = account
  7. @identifier = CLD3::NNetLanguageIdentifier.new(1, 2048)
  8. end
  9. def to_iso_s
  10. detected_language_code || default_locale
  11. end
  12. def prepared_text
  13. simplified_text.strip
  14. end
  15. private
  16. def detected_language_code
  17. result.language.to_sym if detected_language_reliable?
  18. end
  19. def result
  20. @result ||= @identifier.find_language(prepared_text)
  21. end
  22. def detected_language_reliable?
  23. result.reliable?
  24. end
  25. def simplified_text
  26. text.dup.tap do |new_text|
  27. URI.extract(new_text).each do |url|
  28. new_text.gsub!(url, '')
  29. end
  30. new_text.gsub!(Account::MENTION_RE, '')
  31. new_text.gsub!(Tag::HASHTAG_RE, '')
  32. new_text.gsub!(/\s+/, ' ')
  33. end
  34. end
  35. def default_locale
  36. account&.user_locale&.to_sym || nil
  37. end
  38. end