‎

1. most common structures
- 1.1. sliced windows
- 1.2. compare row to itself
2. tools 2022 pypi
- 2.1. web frameworks
- 2.2. additional libraries
- 2.3. machine learning frameworks
- 2.4. cloud platforms do you use? *This question is required.
- 2.5. ORM(s) do you use together with Python, if any? *This question is required.
- 2.6. Big Data tool(s) do you use, if any? *This question is required.
- 2.7. Continuous Integration (CI) system(s) do you regularly use? *This question is required.
- 2.8. configuration management tools do you use, if any? *This question is required.
- 2.9. documentation tool do you use? *This question is required.
- 2.10. IDE features
- 2.11. isolate Python environments between projects? *This question is required.
- 2.12. tools related to Python packaging do you use directly? *This question is required.
- 2.13. application dependency management? *This question is required.
- 2.14. automated services to update the versions of application dependencies? *This question is required.
- 2.15. installing packages? *This question is required.
- 2.16. tool(s) do you use to develop Python applications? *This question is required.
- 2.17. job role(s)? *This question is required.
3. install
- 3.1. change Python version Ubuntu & Debian
4. Python theory
- 4.1. Python [ˈpʌɪθ(ə)n] паисэн
- 4.2. philosophy
- 4.3. History
  - 4.3.1. 3.0
- 4.4. Implementations
- 4.5. Bytecode:
- 4.6. terms
- 4.7. Indentation - Отступ слева and blank lines
- 4.8. mathematic
- 4.9. WSGI (Web Server Gateway Interface)(whiskey)
5. scripting
- 5.1. top-level script enironment
- 5.2. command line arguments parsing
- 5.3. python executable
- 5.4. current dir
- 5.5. unix logger
- 5.6. How does python find packages?
- 5.7. dist-packages and site-packages?
- 5.8. file size and modification date
- 5.9. environment
- 5.10. -m mod - run library module as a script
  - 5.10.1. e.g. mymodule/__main__.py:
6. Data model
- 6.1. special types
- 6.2. theory
- 6.3. Types build-in
- 6.4. Truth Value Testing
- 6.5. Shallow and deep copy operations
- 6.6. links
7. typed varibles or type hints
- 7.1. typing.Annotated and PEP-593
  - 7.1.1. from typing import get_type_hints
  - 7.1.2. Use case: A calendar Event model, using pydantic https://github.com/pydantic/pydantic
- 7.2. function annotation
8. Strings
- 8.1. основы
  - 8.1.1. multiline
- 8.2. A formatted string literal or f-string
- 8.3. String Formatting Operator
- 8.4. string literal prefixes
- 8.5. raw strings, Unicode, formatted
- 8.6. Efficient String Concatenation
- 8.7. byte string
9. Classes
- 9.1. basic
- 9.2. Special Attributes
- 9.3. inheritance
- 9.4. Getters and setters
- 9.5. Polymorphism [pɔlɪˈmɔːfɪzm
- 9.6. Protocols or emulation
- 9.7. private and protected
- 9.8. object
- 9.9. Singleton
  - 9.9.1. example
  - 9.9.2. шаблон Monostate
- 9.10. anonumous class
  - 9.10.1. 1
- 9.11. replace method
10. modules and packages
- 10.1. module special attributes (Module level "dunders") [-ʌndə(ɹ)]
11. folders/files USECASES
12. functions
- 12.1. by value or by reference
- 12.2. Types of Аргументы функции
- 12.3. example
- 12.4. arguments, anonymous-lambda, global variables
- 12.5. attributes
- 12.6. function decorators
- 12.7. build-in
- 12.8. Closure
- 12.9. overloading
13. asterisk(*)
14. with
- 14.1. Context manager class TEMPLATE
15. Operators and control structures
- 15.1. basic
- 15.2. Operator Precedence (Приоритет) ˈpresədəns
- 15.3. value unpacking
- 15.4. if, loops
- 15.5. match 3.10
- 15.6. Slicing Sequence
16. Traverse or iteration over containers
- 16.1. iterator object
- 16.2. iterate dictionary
17. The Language Reference
- 17.1. yield and generator expression
- 17.2. yield from
- 17.3. ex
- 17.4. function decorator
- 17.5. class decorator
- 17.6. lines
- 17.7. Indentation
- 17.8. identifier [aɪˈdentɪfaɪər] or names
- 17.9. Keywords Exactly as written here:
- 17.10. Numeric literals
- 17.11. Docstring and comments
- 17.12. Simple statements
- 17.13. open external
  - 17.13.1. ex
  - 17.13.2. links
18. The Python Standard Library
- 18.1. Major libs:
- 18.2. regex - import re
  - 18.2.1. example
  - 18.2.2. get string between substring
- 18.3. datetime
- 18.4. file object
- 18.5. importlib
19. exceptions handling
- 19.1. explanation
- 19.2. traceback
- 19.3. examples
20. Logging
- 20.1. ways to log
- 20.2. terms
- 20.3. getLogger()
- 20.4. stderror
- 20.5. inspection
- 20.6. levels
21. Collections
- 21.1. collections.Counter() - dict subclass for counting hashable objects
- 21.2. time complexity
22. Conventions
- 22.1. code style, indentation, naming
- 22.2. 1/2 underscore
- 22.3. Whitespace in Expressions and Statements
- 22.4. naming
- 22.5. docstrings
  - 22.5.1. ex. simple
23. Concurrency
- 23.1. select right API
- 23.2. Process
- 23.3. threading
  - 23.3.1. examples
  - 23.3.2. syncronization
- 23.4. multiprocessing
- 23.5. asyncio
  - 23.5.1. Core terms:
- 23.6. асинхронного программирования (asyncio, async, await)
- 23.7. example multiprocess, Threads, othe thread
24. Monkey patch (modification at runtile)
- 24.1. replace method of class instance
- 24.2. inspect.getmembers() vs dict.items() vs dir()
- 24.3. ex replace function
- 24.4. ex replace method of class
25. Performance Tips
- 25.1. string
- 25.2. loop
- 25.3. Avoiding dots…
- 25.4. avoid global variables
- 25.5. dict
26. decorators
- 26.1. ex
27. Assert
28. Debugging and Profiling
- 28.1. cProfile
- 28.2. small code measure 1
- 28.3. small code measure 2
- 28.4. breakpoint and code investigation
29. inject
- 29.1. Callable
- 29.2. links
30. BUILD and PACKAGING
- 30.1. build tools:
- 30.2. toml format for pyproject.toml
- 30.3. pyproject.toml
- 30.4. build
- 30.5. distutils (old)
- 30.6. terms
- 30.7. recommended
- 30.8. Upload to the package distribution service
  - 30.8.1. TODO twine
  - 30.8.2. TODO Github actions
- 30.9. editable installs PEP660
- 30.10. PyPi project name, name normalization and other specifications
  - 30.10.1. links
- 30.11. TODO src layout vs flat layout
- 30.12. links
31. setuptools - build system
32. pip (package manager)
- 32.1. release steps
  - 32.1.1. links
- 32.2. wheels
- 32.3. virtualenv
- 32.4. venv
- 32.5. update
- 32.6. requirements.txt
- 32.7. errors
- 32.8. cache dir
  - 32.8.1. links
- 32.9. hashes
- 32.10. add SSL certificate
  - 32.10.1. crt(not working)
  - 32.10.2. pem(not working)
- 32.11. ignore SSL certificates
- 32.12. links
33. urllib3 and requests library
- 33.1. difference
- 33.2. see raw request
  - 33.2.1. requests
  - 33.2.2. links
- 33.3. problems:
- 33.4. links
34. pdf 2 png
- 34.1. pdf2image
- 34.2. Wand
- 34.3. PyMuPDF
35. statsmodels
- 35.1. ACF, PACF
- 35.2. bar plot
36. XGBoost
- 36.1. usage
- 36.2. categorical columns
  - 36.2.1. Feature importance between numerical and categorical features
- 36.3. gpu support
- 36.4. result value from leaf value
- 36.5. terms
- 36.6. xgb.DMatrix
  - 36.6.1. LibSVM file format
- 36.7. parameters
- 36.8. print important features
- 36.9. TODO prune обрезание деревьев
- 36.10. permutation importance
- 36.11. model to if-else
- 36.12. Errors
  - 36.12.1. ValueError: setting an array element with a sequence.
  - 36.12.2. label must be in [0,1] for logistic regression
37. Natasha & Yargy
- 37.1. yargy
38. Stanford NER - Java
- 38.1. train
- 38.2. Ttraining data
39. DeepPavlov
- 39.1. Коммандная-строка
- 39.2. вспомогательные классы
- 39.3. in code
- 39.4. installation
- 39.5. training
  - 39.5.1. dataset_iterators
- 39.6. NLP pipeline json config
- 39.7. prerocessors
  - 39.7.1. tokenizers
  - 39.7.2. Embedder [ɪmˈbede] - Deep contextualized word reprezentation
- 39.8. components
- 39.9. Models
- 39.10. speelcheking
  - 39.10.1. Tie vocabulary
- 39.11. Classification
  - 39.11.1. bert
  - 39.11.2. iterators
- 39.12. NER - componen
- 39.13. Custom component
40. AllenNLP
41. spaCy
42. fastText
- 42.1. install
43. TODO rusvectores
44. Natural Language Toolkit (NLTK)
- 44.1. collocations
- 44.2. Association measures for collocations (measure functions)
- 44.3. Taggers
- 44.4. Корпус русского языка
45. pymorphy2
46. linux NLP
- 46.1. count max words in line of file
47. fuzzysearch
- 47.1. typesense
  - 47.1.1. pip3 install typesense –user
48. Audio - librosa
- 48.1. generic audio characteristics
- 48.2. load
- 48.3. the Fourier transform - spectrum
- 48.4. spectrogram
- 48.5. log-Mel spectrogram
- 48.6. distinguish emotions
- 48.7. links
49. Audio
- 49.1. terms
- 49.2. theory
- 49.3. The Fourier Transform (spectrum)
- 49.4. log-Mel spectrogram
  - 49.4.1. Log - because
- 49.5. pyo
- 49.6. torchaudio
- 49.7. ffmpeg-python
50. Whisper
- 50.1. Byte-Pair Encoding (BPE)
  - 50.1.1. usage
- 50.2. model.transcribe(filepath or numpy)
  - 50.2.1. return
- 50.3. model.decode(mel, options)
- 50.4. no_speech_prob and avg_logprob
- 50.5. decode from whisper_word_level 844
- 50.6. main_loop
- 50.7. words timestemps https://github.com/jianfch/stable-ts
  - 50.7.1. transcribe format
- 50.8. confidence score
- 50.9. TODO main/notebooks
- 50.10. links
51. NER USΕ CASES
- 51.1. Spelling correction algorithms or (spell checker) or (comparing a word to a list of words)
- 51.2. fuzzy string comparision или Приближённый поиск
52. Flax and Jax
53. hyperparemeter optimization library test-tube
54. Keras
- 54.1. install
- 54.2. API types
- 54.3. Sequential model
- 54.4. functional API
- 54.5. Layers
  - 54.5.1. types
  - 54.5.2. Dense
- 54.6. Models
- 54.7. Accuracy:
- 54.8. input shape & text prepare
- 54.9. ValueError: Error when checking input: expected input_1 to have 3 dimensions, but got array with shape
- 54.10. merge inputs
- 54.11. convolution
- 54.12. character CNN
- 54.13. Early stopping
- 54.14. plot history
- 54.15. ImageDataGenerator class
- 54.16. CNN Rotate
- 54.17. LSTM
55. Tesseract - Optical Character Recognition
- 55.1. compilation
- 55.2. black and white list
- 55.3. notes
- 55.4. prepare
- 55.5. usage
56. FEATURE ENGEERING
- 56.1. Featuretools - Aturomatic Feature Engeering
- 56.2. TODO informationsfabirc
- 56.3. TODO TPOT
- 56.4. TSFRESH (time sequence)
- 56.5. ATgfe - new feature
57. support libraries
58. Microsoft nni AutoML framework (stupid shut)
59. transformers - provides pretrained models
60. help
- 60.1. build-in help
61. IDE
- 61.1. EPL
- 61.2. PyDev is a Python IDE for Eclipse
  - 61.2.1. features
- 61.3. Emacs
  - 61.3.1. python in org mode
  - 61.3.2. Emacs
- 61.4. PyCharm
  - 61.4.1. installation:
  - 61.4.2. keys
- 61.5. ipython
- 61.6. geany
- 61.7. BlueFish
- 61.8. Eric
- 61.9. Google Colab
62. Jupyter Notebook
- 62.1. jupyter [ˈʤuːpɪtə] - акцентом на интерактивности производимых вычислений
- 62.2. install
- 62.3. convert to htmp
- 62.4. Widgets
- 62.5. Hotkeys:
- 62.6. emacs (sucks)
- 62.7. other
- 62.8. TODO lab
63. USΕ CASES
- 63.1. NET
  - 63.1.1. REST request
  - 63.1.2. email IMAP
  - 63.1.3. email DKIM
  - 63.1.4. urllib SOCKS
- 63.2. LISTS
  - 63.2.1. all has one value
  - 63.2.2. 2D list to 1D dict or list
  - 63.2.3. list to string
  - 63.2.4. replace one with two
  - 63.2.5. remove elements
  - 63.2.6. average
  - 63.2.7. [1, -2, 3, -4, 5]
  - 63.2.8. ZIP массивов с разной длинной
  - 63.2.9. Shuffle two lists
  - 63.2.10. list of dictionaries
  - 63.2.11. closest in list
  - 63.2.12. TIMΕ SEQUENCE
  - 63.2.13. split list in chunks
- 63.3. FILES
  - 63.3.1. Read JSON
  - 63.3.2. CSV
  - 63.3.3. read file
  - 63.3.4. Export to Excell
  - 63.3.5. NameError: name 'A' is not defined
  - 63.3.6. rename files (list directory)
  - 63.3.7. current directory
- 63.4. STRINGS
  - 63.4.1. String comparision
  - 63.4.2. Remove whitespaces
  - 63.4.3. Unicode
  - 63.4.4. To find all the repeating substring in a given string
  - 63.4.5. first substring
- 63.5. DICT
  - 63.5.1. del
- 63.6. argparse: command line arguments
  - 63.6.1. terms
  - 63.6.2. usage
  - 63.6.3. optional positional argument
- 63.7. way to terminate
- 63.8. JSON
- 63.9. NN EQUAL QUANTITY FROM SAMPLES
- 63.10. most common ellement
- 63.11. print numpers
- 63.12. SCALE
- 63.13. smoth
- 63.14. one-hot encoding
  - 63.14.1. we have [1,3] [1,2,3,4], [3,4] -> numbers
  - 63.14.2. column of strings
- 63.15. binary encoding
- 63.16. map encoding
- 63.17. Accuracy
- 63.18. garbage collect
- 63.19. Class loop for member varibles
- 63.20. filter special characters
- 63.21. measure time
- 63.22. primes in interval
- 63.23. unicode characters in interval
64. Flask
- 64.1. terms
- 64.2. components
- 64.3. static files and debugging console
  - 64.3.1. get URL
  - 64.3.2. path and console
- 64.4. start, run
  - 64.4.1. start $flask run (recommended)
  - 64.4.2. start app.run()
  - 64.4.3. links
- 64.5. Quart
- 64.6. GET
  - 64.6.1. variables
  - 64.6.2. parameters ?key=value
- 64.7. app.route
- 64.8. gentoo dependencies
- 64.9. blueprints
- 64.10. Hello world
- 64.11. curl
- 64.12. response object
- 64.13. request object
  - 64.13.1. get all values
- 64.14. Jinja templates
  - 64.14.1. own filters:
  - 64.14.2. links
- 64.15. security
- 64.16. my projects
  - 64.16.1. testing1
  - 64.16.2. testing2
  - 64.16.3. file storage
- 64.17. Flask-2.2.2 hashes
- 64.18. flask-restful
- 64.19. example
  - 64.19.1. image
- 64.20. swagger
- 64.21. werkzeug
- 64.22. debug
- 64.23. test
- 64.24. production
- 64.25. vulnerabilities
- 64.26. USECASES
  - 64.26.1. check file exist
  - 64.26.2. call POST method
  - 64.26.3. call GET method with arguments
  - 64.26.4. print headers
  - 64.26.5. TLS server
- 64.27. async/await and ASGI
- 64.28. use HTTPS
- 64.29. links
65. FastAPI
66. Databases
- 66.1. Groonga
  - 66.1.1. Basic commands:
  - 66.1.2. python
- 66.2. Oracle
  - 66.2.1. sql
- 66.3. MySQL
67. Virtualenv
- 67.1. venv - default module
- 67.2. virtualenv
68. ldap
69. Containerized development
70. security
71. serialization
- 71.1. pickle
72. cython
73. headles browsers
74. selenium
- 74.1. drivers
- 74.2. install
- 74.3. python installantion
- 74.4. python usage
- 74.5. links
75. plot in terminal
- 75.1. plotext
76. xml parsing
77. pytest
- 77.1. features
- 77.2. layout
- 77.3. usage
- 77.4. dependencies
- 77.5. fixtures - context for the test
- 77.6. print
- 77.7. troubleshooting
- 77.8. links
78. static analysis tools:
- 78.1. security
- 78.2. mypy
79. release as execuable - Pyinstaller
80. troubleshooting

-- mode: Org; fill-column: 110; coding: utf-8; -- #+TITLE Python my notes

build in functions https://docs.python.org/3/library/functions.html
pypi https://pypi.org/
https://www.tutorialspoint.com/python3/python_modules.htm
doc https://docs.python.org/3/contents.html
https://docs.python.org/3/index.html
software https://github.com/vinta/awesome-python

TODO from os import environ as env env.get('MYSQL_PASSWORD')

1. most common structures

1.1. sliced windows

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

# or
seq = [0, 1, 2, 3, 4, 5]
window_size = 3

for i in range(len(seq) - window_size + 1):
    print(seq[i: i + window_size])

1.2. compare row to itself

import numpy as np
a = [0,1,2,3,4,5,6,7,8,9]

r = np.zeros((len(a),len(a)))
for x in a:
    for y in a:
        if y<x:
            continue # we skip y!
        r[x,y] = x+y

print(r)

[[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
 [ 0.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
 [ 0.  0.  4.  5.  6.  7.  8.  9. 10. 11.]
 [ 0.  0.  0.  6.  7.  8.  9. 10. 11. 12.]
 [ 0.  0.  0.  0.  8.  9. 10. 11. 12. 13.]
 [ 0.  0.  0.  0.  0. 10. 11. 12. 13. 14.]
 [ 0.  0.  0.  0.  0.  0. 12. 13. 14. 15.]
 [ 0.  0.  0.  0.  0.  0.  0. 14. 15. 16.]
 [ 0.  0.  0.  0.  0.  0.  0.  0. 16. 17.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0. 18.]]

2. tools 2022 pypi

2.1. web frameworks

Bottle
CherryPy
Django
Falcon
FastAPI
Flask
Hug
Pyramid
Tornado
web2py

2.2. additional libraries

aiohttp
Asyncio
httpx
Pillow
Pygame
PyGTK
PyQT
Requests
Six
Tkinter
Twisted
Kivy
wxPython
Scrapy

2.3. machine learning frameworks

Gensim
MXNet
NLTK
Theano

2.4. cloud platforms do you use? *This question is required.

AWS
Rackspace
Linode
OpenShift
PythonAnywhere
Heroku
Microsoft Azure
DigitalOcean
Google Cloud Platform
OpenStack

2.5. ORM(s) do you use together with Python, if any? *This question is required.

No database development
Tortoise ORM
Dejavu
Peewee
SQLAlchemy
Django ORM
PonyORM
Raw SQL
SQLObject

2.6. Big Data tool(s) do you use, if any? *This question is required.

None
Apache Samza
Apache Kafka
Dask
Apache Beam
Apache Hive
Apache Hadoop/MapReduce
Apache Spark
Apache Tez
Apache Flink
ClickHouse

2.7. Continuous Integration (CI) system(s) do you regularly use? *This question is required.

CruiseControl
Gitlab CI
Travis CI
TeamCity
Bitbucket Pipelines
AppVeyor
GitHub Actions
Jenkins / Hudson
CircleCI
Bamboo

2.8. configuration management tools do you use, if any? *This question is required.

None
Chef
Puppet
Custom solution
Ansible
Salt

2.9. documentation tool do you use? *This question is required.

I don’t use any documentation tools
Sphinx
MKDocs
Doxygen

2.10. IDE features

use Version Control Systems use Version Control Systems: Often use Version Control Systems: From timeto time use Version Control Systems: Never orAlmost never
use Issue Trackers use Issue Trackers: Often use Issue Trackers: From timeto time use Issue Trackers: Never orAlmost never
use code coverage use code coverage: Often use code coverage: From timeto time use code coverage: Never orAlmost never
use code linting (programs that analyze code for potential errors) use code linting (programs that analyze code for potential errors): Often use code linting (programs that analyze code for potential errors): From timeto time use code linting (programs that analyze code for potential errors): Never orAlmost never
use Continuous Integration tools use Continuous Integration tools: Often use Continuous Integration tools: From timeto time use Continuous Integration tools: Never orAlmost never
use optional type hinting use optional type hinting: Often use optional type hinting: From timeto time use optional type hinting: Never orAlmost never
use NoSQL databases use NoSQL databases: Often use NoSQL databases: From timeto time use NoSQL databases: Never orAlmost never
use autocompletion in your editor use autocompletion in your editor: Often use autocompletion in your editor: From timeto time use autocompletion in your editor: Never orAlmost never
run / debug or edit code on remote machines (remote hosts, VMs, etc.) run / debug or edit code on remote machines (remote hosts, VMs, etc.): Often run / debug or edit code on remote machines (remote hosts, VMs, etc.): From timeto time run / debug or edit code on remote machines (remote hosts, VMs, etc.): Never orAlmost never
use SQL databases use SQL databases : Often use SQL databases : From timeto time use SQL databases : Never orAlmost never
use a Python profiler use a Python profiler: Often use a Python profiler: From timeto time use a Python profiler: Never orAlmost never
use Python virtual environments for your projects use Python virtual environments for your projects: Often use Python virtual environments for your projects: From timeto time use Python virtual environments for your projects: Never orAlmost never
use a debugger use a debugger: Often use a debugger: From timeto time use a debugger: Never orAlmost never
write tests for your code write tests for your code: Often write tests for your code: From timeto time write tests for your code: Never orAlmost never
refactor your code refactor your code: Often refactor your code: From timeto time refactor your code: Never orAlmost never

2.11. isolate Python environments between projects? *This question is required.

virtualenv
venv
virtualenvwrapper
hatch
Poetry
pipenv
Conda

2.12. tools related to Python packaging do you use directly? *This question is required.

pip
Conda
pipenv
Poetry
venv (standard library)
virtualenv
flit
tox
PDM
twine
Containers (eg: via Docker)
Virtual machines
Workplace specific proprietary solution

2.13. application dependency management? *This question is required.

None
pipenv
poetry
pip-tools

2.14. automated services to update the versions of application dependencies? *This question is required.

None
Dependabot
PyUp
Custom tools, e.g. a cron job or scheduled CI task
No, my application dependencies are updated manually

2.15. installing packages? *This question is required.

None
pip
easy_install
Conda
Poetry
pip-sync
pipx

2.16. tool(s) do you use to develop Python applications? *This question is required.

None / I'm not sure
Setuptools
build
Wheel
Enscons
pex
Flit
Poetry
conda-build
maturin
PDM-PEP517

2.17. job role(s)? *This question is required.

Architect
QA engineer
Business analyst
DBA
CIO / CEO / CTO
Technical support
Technical writer
Team lead
Systems analyst
Data analyst
Product manager
Developer / Programmer

3. install

pip3 install –upgrade pip –user

3.1. change Python version Ubuntu & Debian

update-alternatives –install /usr/bin/python python /usr/bin/python3.8 1 echo 1 | update-alternatives –config python

4. Python theory

4.1. Python [ˈpʌɪθ(ə)n] паисэн

interpreted
code readability
indentation instead of curly braces
designed to be highly extensible
garbage collector
functions are first class citizens
multiple inheritance
all parameters (arguments) are passed by reference
nothing in Python makes it possible to enforce data hiding
all classes inherit from object

Multi-paradigm:

imperative
procedural
object-oriented
functional (in the Lisp tradition) - (itertools and functools) - borrowed from Haskell and Standard ML
reflective
aspect-oriented programming by metaprogramming[42] and metaobjects (magic methods)
dynamic name resolution (late binding) ?????????

Typing discipline:

Duck
dynamic
gradual (since 3.5) - mey be defined with type(static) or not(dynamic).
strong

Python and CPython are managed by the non-profit Python Software Foundation.

The Python Standard Library 3.6

string processing (regular expressions, Unicode, calculating differences between files)
Internet protocols (HTTP, FTP, SMTP, XML-RPC, POP, IMAP, CGI programming)
software engineering (unit testing, logging, profiling, parsing Python code)
operating system interfaces (system calls, filesystems, TCP/IP sockets)

4.2. philosophy

document Zen of Python (PEP 20)

Beautiful is better than ugly
Explicit is better than implicit
Simple is better than complex
Complex is better than complicated
Readability counts
Errors should never pass silently. Unless explicitly silenced.
There should be one– and preferably only one –obvious way to do it.
If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea – let's do more of those!

Other

"there should be one—and preferably only one—obvious way to do it"
goal - keeping it fun to use ( spam and eggs instead of the standard foo and bar)
pythonic - related to style (code is pythonic )
Pythonists, Pythonistas, and Pythoneers - питонутые

https://peps.python.org/pep-0020/#id3

4.3. History

Every revision of Python enjoys performance improvements over the previous version.

1989
2000 - Python 2.0 - cycle-detecting garbage collector and support for Unicode
2008 - Python 3.0 - not completely backward-compatible - include the 2to3 utility, which automates (at least partially) the translation of Python 2 code to Python 3.
2009 Python 3.1 ordered dictionaries,
2015 Python 3.5 typed varibles
2016 Python 3.6 asyncio, Formatted string literals (f-strings), Syntax for variable annotations.
- PEP523 API to make frame evaluation pluggable at the C level.

3.7

built-in breakpoint() function that calls pdb. before was: import pdb; pdb.set_trace()
@dataclass - class annotations shugar
contextvars module - mechanism for managing Thread-local context variables, similar to thread-local storage (TLS), PEP 550
from dataclasses import dataclass @dataclass - comes with basic functionality already implemented: instantiate, print, and compare data class instances

3.8

Positional-Only Parameter: pow(x, y, z=None, /)
Assignment Expressions: if (match := pattern.search(data)) is not None: - This feature allows developers to assign values to variables within an expression.
f"{a=}", f"Square has area of {(area := length**2)} perimeter of {(perimeter := length*4)}"
new SyntaxWarnings: when to choose is over ==, miss a comma in a list

3.9

Merge (|) and update (|=) added to dict library to compliment dict.update() method and {**d1, **d2}.
Added str.removeprefix(prefix) and str.removesuffix(suffix) to easily remove unneeded sections of a string.
More Flexible Decorators: Traditionally, a decorator has had to be a named, callable object, usually a function or a class. PEP 614 allows decorators to be any callable expression.
- before: decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
- after: decorator: '@' namedexpr_test NEWLINE
typehints: list[int] do not require import typing;
Annotated[int, ctype("char")] - integer that should be considered as a char type in C.
Better time zones handling.
The new parser based on PEG was introduced, making it easier to add new syntax to the language.

3.10

Structural pattern matching (PEP 634) was added, providing a way to match against and destructure data structures.
- match command.split(): case [action, obj]: # interpret action, obj
The new Parenthesized context managers syntax (PEP 618) was introduced, making it easier to write context managers using less boilerplate code.
Improved error messages and error recovery were added to the parser, making it easier to debug syntax errors.
Parenthesized Context Managers: This feature improves the readability of with statements by allowing developers to use parentheses. with (open("test_file1.txt", "w") as test, open("test_file2.txt", "w") as test2):

3.11

The built-in pip package installer was upgraded to version 21.0, providing new features and improvements to package management.
Improved error messages and error handling were added to the interpreter, making it easier to understand and recover from runtime errors.
Some of the built-in modules were updated and improved, including the asyncio and typing modules.
Better hash randomization: This improves the security of Python by making it more difficult for attackers to exploit hash-based algorithms that are used for various operations such as dictionary lookups.
package has been deprecated

3.12

distutils removed
allow perf - linux profiler, new API for profilers, sys.monitoring
buffer protocol - access to the raw region of memory
type-hits:
- TypedDict - source of types. for typing **kwargs
- doesn't need to import TypeVar. func[T] syntax to indicate generic type references
- @override decorator can be used to flag methods that override methods in a parent
concurrency preparing:
- Immortal objects - to implement other optimizations (like avoiding copy-on-write)
- subinterpreters - the ability to have multiple instances of an interpreter, each with its own GIL, no
end-user interface to subinterpreters.
- asyncio is larger and faster
sqlite3 module: command-line interface has been added to the
unittest: Add a –durations command line option, showing the N slowest test cases

4.3.1. 3.0

Old feature removal: old-style classes, string exceptions, and implicit relative imports are no longer supported.
exceptions now need the as keyword, exec as var
with is now built in and no longer needs to be imported from future.
range: xrange() from Python 2 has been replaced by range(). The original range() behavior is no longer available.
print changed
input
all text content such as strings are Unicode by default
/ -> float, in 2.0 it was integer. // operator added.
Python 2.7 cannot be translation to Python 3.

4.4. Implementations

CPython, the reference implementation of Python

interpreter and a compiler as it compiles Python code into bytecode before interpreting it
(GIL) problem - only one thread may be processing Python bytecode at any one time
- One thread may be waiting for a client to reply, and another may be waiting for a database query to execute, while the third thread is actually processing Python code.
- Concurrency can only be achieved with separate CPython interpreter processes managed by a multitasking operating system

implementations that are known to be compatible with a given version of the language are IronPython, Jython and PyPy.

IronPython -C#- use JIT- targeting the .NET Framework and Mono. created here known not to work under CPython
PyPy - just-in-time compiler. written completely in Python.
Jython - Python in Java for the Java platform

CPython based:

Cython - translates a Python script into C and makes direct C-level API calls into the Python interpreter

Stackless Python - a significant fork of CPython that implements microthreads; it does not use the C memory stack, thus allowing massively concurrent programs.

Numba - NumPy-aware optimizing runtime compiler for Python

MicroPython - Python for microcontrollers (runs on the pyboard and the BBC Microbit)

Jython and IronPython - do not have a GIL and so multithreaded execution for a CPU-bound python application will work. These platforms are always playing catch-up with new language features or library features, so unfortunately

Pythran, a static Python-to-C++ extension compiler for a subset of the language, mostly targeted at numerical computation. Pythran can be (and is probably best) used as an additional backend for NumPy code in Cython.

mypyc, a static Python-to-C extension compiler, based on the mypy static Python analyser. Like Cython's pure Python mode, mypyc can make use of PEP-484 type annotations to optimise code for static types. Cons: no support for low-level optimisations and typing, opinionated Python type interpretation, reduced Python compatibility and introspection after compilation

Nuitka, a static Python-to-C extension compiler.

Pros: highly language compliant, reasonable performance gains, support for static application linking (similar to cython_freeze but with the ability to bundle library dependencies into a self-contained executable)
Cons: no support for low-level optimisations and typing

Brython is an implementation of Python 3 for client-side web programming (in JavaScript). It provides a subset of Python 3 standard library combined with access to DOM objects. It is packaged in Gentoo as dev-python/brython.

4.5. Bytecode:

Java is compiled into bytecode and then executed by the JVM.
C language is compiled into object code, and then becomes the executable file after the linker
Python is first converted to the bytecode and then executed via ceval.c. The interpreter directly executes thetranslated instruction set.

Bytecide is a set of instructions for a virtual machine which is called the Python Virtual Machine (PVM).

The PVM is an interpreter that runs the bytecode.

The bytecode is platform-independent, but PVM is specific to the target machine. .pyc file.

The bytecode files are stored in a folder named pycache. This folder is automatically created when you try to import another file that you created.

manually create it: manually create it: python -m compileall file_1.py … file_n.py

4.6. terms

binding the name to the object - x = 2 - (generic) name x receives a reference to a separate, dynamically allocated object of numeric (int) type of value 2

4.7. Indentation - Отступ слева and blank lines

Количество отступов не важно.

if True: print "Answer" // both prints called suite and header line with : - if print "True" else: print "Answer" print "False"

Blank Lines - ignored

semicolon ( ; ) allows multiple statements

Внутри:

INDENT - token означающий начало нового блока
DEDENT - конец блока.

4.8. mathematic

арифметика произвольной точности длина чисел ограничена только объёмом доступной памяти
Extensive mathematics library, and the third-party library NumPy that further extends the native capabilities
a < b < c - support

4.9. WSGI (Web Server Gateway Interface)(whiskey)

calling convention for web servers to forward requests to web applications or frameworks written in the Python programming language.
like Java's "servlet" API.
WSGI middleware components, which implement both sides of the API, typically in Python code.

5. scripting

5.1. top-level script enironment

__name__ - equal to 'main' when as a script or "python -m" or from an interactive prompt. 'main' is the name of the scope in which top-level code executes.

if name == "main": - not execute when imported

__file__ - full path to module file

5.2. command line arguments parsing

import sys

print 'Number of arguments:', len(sys.argv), 'arguments.' print 'Argument List:', str(sys.argv)

getopt module for better

5.3. python executable

-c cmd : program passed in as string (terminates option list)
-m mod : run library module as a script (terminates option list)
-O : remove assert and debug-dependent statements; add .opt-1 before .pyc extension; also PYTHONOPTIMIZE=x
-OO : do -O changes and also discard docstrings; add .opt-2 before .pyc extension
-s : don't add user site directory to sys.path; also PYTHONNOUSERSITE. Disable home/u2.local/lib/python3.8/site-packages
-S : don't imply 'import site' on initialization
- /usr/lib/python38.zip
- /usr/lib/python3.8
- /usr/lib/python3.8/lib-dynload

5.4. current dir

script_dir=os.path.dirname(os.path.abspath(file))

5.5. unix logger

def init_logger(level, logfile_path: str = None):
    """
    stderr  WARNING ERROR and CRITICAL
    stdout < WARNING

    :param logfile_path:
    :param level: level for stdout
    :return:
    """

    formatter = logging.Formatter('mkbsftp [%(asctime)s] %(levelname)-6s %(message)s')
    logger = logging.getLogger(__name__)
    logger.setLevel(level)  # debug - lowest
    # log file
    if logfile_path is not None:
        h0 = logging.FileHandler(logfile_path)
        h0.setLevel(level)
        h0.setFormatter(formatter)
        logger.addHandler(h0)
    # stdout -- python3 script.py 2>/dev/null | xargs
    h1 = logging.StreamHandler(sys.stdout)
    h1.setLevel(level)  # level may be changed
    h1.addFilter(lambda record: record.levelno < logging.WARNING)
    h1.setFormatter(formatter)
    # stderr -- python3 script.py 2>&1 >/dev/null | xargs
    h2 = logging.StreamHandler(sys.stderr)
    h2.setLevel(logging.WARNING)  # fixed level
    h2.setFormatter(formatter)

    logger.addHandler(h1)
    logger.addHandler(h2)
    return logger

5.6. How does python find packages?

sys.path - Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

find module:

import imp
imp.find_module('numpy')

5.7. dist-packages and site-packages?

dist-packages is a Debian-specific convention that is also present in its derivatives, like Ubuntu. Modules are installed to dist-packages when they come from the Debian package manager. This is to reduce conflict between the system Python, and any from-source Python build you might install manually.

https://wiki.debian.org/Python

5.8. file size and modification date

os.stat(pf).st_size
os.stat(pf).st_mtime

5.9. environment

os.environ - dictionary

try … except KeyError: - no variable in dictionary

os.environ.get('FLASK_SOME_STAFF') - None if no key

export BBB ; python
os.environ['BBB'] # KeyError

DEBUG = os.environ.get('DEBUG', False) # sed DEBUG to  True of False

5.10. -m mod - run library module as a script

https://peps.python.org/pep-0338/

__name__ is always 'main'

5.10.1. e.g. mymodule/main.py:

import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--port", action="store", default="8080")
    parser.add_argument("--host", action="store", default="0.0.0.0")
    args = parser.parse_args()
    port = int(args.port)
    host = str(args.host)
    app.run(host=host, port=port, debug=False)
    return 0

if __name__=="__main__":
    main()

6. Data model

Five standard data types −

Numbers
String
List :list - []
Tuple :tuple - ()
Dictionary :dict - {}
Callable :callable
:object

6.1. special types

https://docs.python.org/3/reference/datamodel.html

None - a single value
NotImplemented - Numeric methods and rich comparison methods should return this value if they do not implement the operation for the operands provided.
Ellipsis - accessed through the literal … or the built-in name Ellipsis.
numbers.Number
Sequences - represent finite ordered sets indexed by non-negative numbers (len() for sequence)
- mutable: lists, Byte Arrays
- immutable: str, tuple, bytes
Set types -
- Sets - mutable
- Frozen sets - frozenset()
Mappings - indexet by [2:3], have del and
Callable
- Instance methods
- Generator functions - function or method which uses the yield statement
  - when called, always returns an iterator object
- Coroutine functions - async def - when called, returns a coroutine object
- Asynchronous generator functions
- Built-in functions
- Built-in methods
- Classes - factories for new instances of themselves
- Class Instances - can be made callable by defining a __call__() method in their class.
Modules name The module’s name, doc, file - The pathname of the file from which the module was loaded,__annotations__, dict is the module’s namespace as a dictionary object.
Custom classes -
Class instances

6.2. theory

everything is an object, even classes. (Von Neumann’s model of a “stored program computer”)
object has identity, a type and a value
identity - address in memory, never changed once created instance
- id(object) = identity
- x is y - compare identities x is not y
type or class
- type()
value of some objects can change - mutable immutable - even if refered object inside mutable
- numbers, strings and tuples are immutable
- dictionaries and lists are mutable

6.3. Types build-in

None - name to access single object - to signify the absence of a value = false.
NotImplemented - name to access single object - Numeric methods and rich comparison methods should return this value if they do not implement the operation for the operands provided. = true.
Ellipsis - single object with name to access - … or Ellipsis = true
numbers.Number - immutable
- numbers.Integral
  - Integers (int) - unlimited range
  - Booleans (bool) - 0 and 1, in most contextes "False" or "True"
- numbers.Real (float) - underlying machine architecture определеяет accepted range and handling of overflow
- numbers.Complex (complex) - z.real and z.imag - pair of machine-level double precision floating point numbers
Sequences - finite ordered sets len() - index a[i]: 0 to n-1; min(s), max(s) ; s * n - n copies of s ; s + t concatenation; x in s - True if an item of s is equal to x
- Immutable sequences - list.index(obj)
  - str - UTF-8 - s[0] = string with length 1(code point). ord(s) - code point to 0 - 10FFFF ; chr(i) int to integer.; str.encode() -> bytes.decode() <-
  - Tuple - неизменяемый (), (1,) (1,'23') any type.
  - range()
  - Bytes - items are 8-bit byte = 1-255 - literal xb'ab' ; bytes() - creates;
- Mutable unhashable - del list[0] - без первого -
  - List - изменяемый [1,'3'] any type.
  - Byte Array - bytearray - bytearray()
  - memoryview
Set types - unordered - finite sets of unique - immutable - compare by == ; has len()
- set - mutable - items must be imutable x in set for x in set - {'h', 'o', 'l', 'e'}
- frozenset - immutable and hashable - it can be used again as an element of another set
Mappings - finite sets, finctions: del a[k], len()
- Dictionary - mutable - Keys are unique within a dictionary - indexed by nearly arbitrary values -_Keys must be immutable_ - {2 : 'Zara', 'Age' : 7, 'Class' : 'First'} dict[3] = "my" # Add new entry
Callable types - to which call operation can be applied - код, который можеть быть вызван
- User-defined functions
- Instance methods: read-only attributes:
- Generator functions - function which returns a generator iterator. It looks like a normal function except that it contains yield expressions ??????
- Coroutine functions - async def - returns a coroutine object ???
- Asynchronous generator functions
- Built-in functions - len() and math.sin() (math is a standard built-in module)
- Built-in methods alist.append()
- Classes - act as factories for new instances of themselves. arguments of the call are passed to __new__()
- Class Instances - may be callable by defining a __call__() method
Modules
Custom classes

6.4. Truth Value Testing

https://docs.python.org/3/library/stdtypes.html

false:

None and False.
zero of any numeric type: 0, 0.0, 0j, Decimal(0), Fraction(0, 1)
empty sequences and collections: '', (), [], {}, set(), range(0)

6.5. Shallow and deep copy operations

import copy
copy.copy(x) Return a shallow copy of x.
copy.deepcopy(x[, memo]) Return a deep copy of x.
calss own copy: __copy__() and __deepcopy__()

6.6. links

https://docs.python.org/3/reference/datamodel.html
https://docs.python.org/3/library/stdtypes.html
object by name or by link: muttable immutalbe 2019 https://realpython.com/pointers-in-python/

7. typed varibles or type hints

https://docs.python.org/3/library/typing.html
from typing import Dict, Tuple, Sequence, Any, Union, Tuple, Callable, TypeVar, Generic

variable_name: type

7.1. typing.Annotated and PEP-593

data models, validation, serialization, UI

v: Annotated[T, *x]

v: a “name” (variable, function parameter, . . . )
T: a valid type
x: at least one metadata (or annotation), passed in a variadic way. The metadata can be used for either static analysis or at runtime.

Ignorable: When a tool or a library does not support annotations or encounters an unknown annotation it should just ignore it and treat annotated type as the underlying type.

stored in obj.__annotations__

7.1.1. from typing import get_type_hints

@dataclass
class Point:
  x: int
  y: Annotated[int, Label("ordinate")]
{'x': <class 'int'>, 'y': typing.Annotated[int, Label('ordinate')]}

7.1.2. Use case: A calendar Event model, using pydantic https://github.com/pydantic/pydantic

from pydantic import BaseModel
class Event(BaseModel):
    summary: str
    description: str | None = None
    start_at: datetime | None = None
    end_at: datetime | None = None

# -- Validation on datetime fields (using Pydantic)


from pydantic import AfterValidator

class Event(BaseModel):
    summary: str
    description: str | None = None
    start_at: Annotated[datetime | None, AfterValidator(tz_aware)] = None
    end_at: Annotated[datetime | None, AfterValidator(tz_aware)] = None

def tz_aware(d: datetime) -> datetime:
    if d.tzinfo is None or d.tzinfo.utcoffset(d) is None:
        raise ValueError ("expecting a TZ-aware datetime")
    return d

# -- iCalendar serialization support

TZDatetime = Annotated[datetime, AfterValidator(tz_aware)]

from . import ical

class Event(BaseModel):
    summary: Annotated[str, ical.Serializer(label="summary")]
    description: Annotated[str | None, ical.Serializer(label="description")] = None
    start_at: Annotated[TZDatetime | None, ical.Serializer(label="dtstart")] = None
    end_at: Annotated[TZDatetime | None, ical.Serializer(label="dtend")] = None

# module: ical
@dataclass
class Serializer:
    label: str

    def serialize(self, value: Any) -> str:
        if isinstance(value, datetime):
            value = value.astimezone(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
        return f"{self.label.upper()}:{value}"


def serialize_event(obj: Event) -> str:
    lines = []
    for name, a, _ in get_annotations(obj, Serializer):
        if (value := getattr(obj, name, None)) is not None:
            lines.append(a.serialize(value))
    return "\n".join(["BEGIN:VEVENT"] + lines + ["END:VEVENT"])
# console rendering

# >>> evt = Event(
# ... summary="FOSDEM",
# ... start_at=datetime(2024, 2, 3, 9, 00, 0, tzinfo=ZoneInfo("Europe/Brussels")),
# ... end_at=datetime(2024, 2, 4, 17, 00, 0, tzinfo=ZoneInfo("Europe/Brussels")),
# ... )
# >>> print(ical.serialize_event(evt))
# BEGIN:VEVENT
# SUMMARY:FOSDEM
# DTSTART:20240203T080000Z
# DTEND:20240204T160000Z
# END:VEVENT

7.2. function annotation

def function_name(parameter1: type) -> return_type:

from typing import Dict

def get_first_name(full_name: str) -> str:
    return full_name.split(" ")[0]

fallback_name: Dict[str, str] = {
    "first_name": "UserFirstName",
    "last_name": "UserLastName"
}

raw_name: str = input("Please enter your name: ")
first_name: str = get_first_name(raw_name)

# If the user didn't type anything in, use the fallback name
if not first_name:
    first_name = get_first_name(fallback_name)

print(f"Hi, {first_name}!")

8. Strings

Quotation [kwəʊˈteɪʃn] fot string: single ('), double (") and triple (''' or """) quotes to denote string literals

8.1. основы

S = 'str'; S = "str"; S = '''str''';

para_str = """this is a long string that is made up of
several lines and non-printable characters such as
TAB ( \t ) and they will show up that way when displayed.
NEWLINEs within the string, whether explicitly given like
this within the brackets [ \n ], or just a NEWLINE within
the variable assignment will also show up."""

8.1.1. multiline

s = """My Name is Pankajin Developers community."""
s = ('asd' 'asd') = asdasd
backslash

s = "My Name is Pankaj. " \
    "website in Developers community."

s = ' '.join(("My Name is Pankaj. I am the owner of", "JournalDev.com and"))

8.2. A formatted string literal or f-string

equivalent to format()

'!s' calls str() on the expression
'!r' calls repr() on the expression
'!a' calls ascii() on the expression.

>>> name = "Fred"
>>> f"He said his name is {name!r}." # repr() is equivalent to !r
"He said his name is 'Fred'."

Символов после запятой

>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal("12.34567")
>>> f"result: {value:{width}.{precision}}"  # nested fields
'result:      12.35'

Форматирование даты:

>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%B %d, %Y}"  # using date format specifier
'January 27, 2017'
>>> number = 1024
>>> f"{number:#0x}"  # using integer format specifier
'0x400'

format:

>>> '{:,}'.format(1234567890)
'1,234,567,890'
>>> 'Correct answers: {:.2%}'.format(19/22)
'Correct answers: 86.36%'

8.3. String Formatting Operator

print ("My name is %s and weight is %d kg!" % ('Zara', 21))

8.4. string literal prefixes

https://www.python.org/dev/peps/pep-0414/

str or strings - immutable sequences of Unicode code points.

r' R' raw strings: Raw strings do not treat the backslash as a special character at all. print (r'C:\\nowhere')
b' B' bytes (NOT str): may only contain ASCII characters
(no term): ::

8.5. raw strings, Unicode, formatted

r'string' - treat backslashes as literal characters
f'string' or F'string' - f"He said his name is {name!r}." - formatted

8.6. Efficient String Concatenation

concatination at runtime

#Fastest:
s= ''.join([`num` for num in xrange(loop_count)])

def g():
    sb = []
    for i in range(30):
        sb.append("abcdefg"[i%7])

    return ''.join(sb)

print g()   # abcdefgabcdefgabcdefgabcdefgab

8.7. byte string

b''

byte string tp unicode : str.decode()
unicode to byte string: str.encode('')

Your string is already encoded with some encoding. Before encoding it to ascii, you must decode it first. Python is implicity trying to decode it (That's why you get a UnicodeDecodeError not UnicodeEncodeError).

9. Classes

Class object - support two kinds of operations: attribute references and instantiation.
Instance object - attribute references - data and methods

there is data attributes correspond to “instance variables” in Smalltalk, and to “data members” in C++. - - static varible - shared by each instance.

instance varibles may be reassigned
instance methods may be reassigned to any method or function. it is just an alias

object - parent for all classes

__class__ - class of instance
__init__
__new__
__init_subclass__
'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'le', 'lt', 'ne', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook'

9.1. basic

class MyClass:
        a=None
c = MyClass()
c.a = 3 # instance

class MyClass:
    """MyClass.i and MyClass.f are valid attribute references"""
    i = 12345 # class value
    def __init__(self, a):
        self.i = a # create new object value
    def f(self):
        print("f")

x = MyClass(2) # instance ERROR!
x.a = 3; # data attibute

print(x.a)
print(x.i)
print(MyClass.i)
print(x.f)
print(MyClass.f)
# MyClass.f and x.f — it is a method object, not a function object.

3
2
12345
<bound method MyClass.f of <__main__.MyClass object at 0x7f37165d4790>>
<function MyClass.f at 0x7f37165c5440>

class Dog:
    kind = 'canine'         # class variable shared by all instances
    tricks = []             # static!

    def __init__(self, name):
        self.name = name    # instance variable unique to each instance

#-------------- class method
: class C:
:    @classmethod
:    def f(cls, arg1, arg2, ...): ...
#May be called for class C.f() or for instance C().f() For derived class
#                  derived class object is passed as the implied first argument.

9.2. Special Attributes

instance.__class__ - The class to which a class instance belongs.
class.__mro__ or mro() - This attribute is a tuple of classes that are considered when looking for base classes during method resolution.
class.__subclasses__() - Each class keeps a list of weak references to its immediate subclasses.

Class -name The class name.

__module__ The name of the module in which the class was defined.
__dict__ The dictionary containing the class’s namespace.
__bases__ A tuple containing the base classes, in the order of their occurrence in the base class list.
__doc__ The class’s documentation string, or None if undefined.
__annotations__ A dictionary containing variable annotations collected during class body execution. For best practices on working with annotations, please see Annotations Best Practices.
__new__(cls,…) - static method - special-cased so you need not declare it as such. The return value of __new__() should be the new object instance (usually an instance of cls).
- typically: super().__new__(cls[, …]) with appropriate arguments and then modifying the newly-created instance as necessary before returning it.
- then the new instance’s __init__() method will be invoked
__call__(self,…)

Class instances

super() - Return a proxy object that delegates method calls to a parent or sibling class of type

9.3. inheritance

9.3.1. Constructor

classes whose base class is object should not call super().__init__()
class inherited from object by default
you should never write a class that inherits from object and doesn't have an init method

designed for cooperative inheritance: class CoopFoo: def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) # forwards all unused arguments

super(type, object-or-type)

type - get parent or sibling of type
object-or-type.mro() determines the method resolution order to be searched

super(self.__class__, self) == super()

9.3.2. Subclassing:

direct - a - b
indirect - a - b - c
virtual - abstract base class

class SubClassName (ParentClass1[, ParentClass2, ...]):
   'Optional class documentation string'
   class_suite

9.3.3. built-in functions that work with inheritance:

isinstance(obj, int) - True only if obj.__class__ is int or some class derived from int
issubclass(bool, int) - True since bool is a subclass of int
type(ins) == a.__class__
type(ins) is Class_name
isinstance(ins, Class_name)
issubclass(ins.__class__, Class_name)
class.mro() - get class.__mro__ attribute

9.3.4. example

class aa():
    def __init__(self, aaa, vv):
        self.aaa = aaa
        self.vv = vv

    def get(self):
        print(self.aaa + self.vv)

class bb(aa):
    def __init__(self, aaa, *args, **kwargs):
        super().__init__(aaa, *args, **kwargs)
        self.aaa = aaa +'asd'


s = bb('aa', 'vv')
s.get()
>> aaasdvv

9.3.5. Multiple inheritance - left-to-right

Method Resolution Order (MRO) (какой метод вызывать из родителей) changes dynamically to support cooperative calls to super() (class.__mro__) (obj.__class__.__mro__)

__spam textually replaced with _classname__spam - в родительском классе при наследовании

9.3.6. Abstract class (ABC - abstract base class)

Notes:

Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported.

from abc import ABCMeta

class MyABC(metaclass=ABCMeta):
    @abstractmethod
    def foo(self): pass

# or
from abc import ABC

class MyABC(ABC):
    @abstractmethod
    def foo(self): pass

class B(A):
    def __init__(self, first_name, last_name, salary):
        super().__init__(first_name, last_name) # if A has __init__
        self.salary = salary
    def foo(self):
        return true

9.3.7. Virtual sublasses

Virtual subclass - subclass and their descendants of ABC. Made with register method which overloading isinstance() and issubclass()

class MyABC(metaclass=ABCMeta):    pass
MyABC.register(tuple)
assert issubclass(tuple, MyABC) # tuple is virtual subclass of MyABC now

9.3.8. calling parent class constructor

9.4. Getters and setters

no private variables

@property - pythonic way

class Celsius:
    def __init__(self, temperature = 0):
        self.temperature = temperature

    def to_fahrenheit(self):
        return (self.temperature * 1.8) + 32

    def get_temperature(self):
        print("Getting value")
        return self._temperature

    def set_temperature(self, value):
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self._temperature = value

    temperature = property(get_temperature,set_temperature)

>>> c.temperature
Getting value
0
>>> c.temperature = 37
Setting value


#----------- OR ------
class Celsius:
    def __init__(self, temperature = 0):
        self.temperature = temperature

    def to_fahrenheit(self):
        return (self.temperature * 1.8) + 32

    @property
    def temperature(self):
        print("Getting value")
        return self._temperature

    @temperature.setter
    def temperature(self, value):
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self._temperature = value

9.5. Polymorphism [pɔlɪˈmɔːfɪzm

inheritance for shared behavior, not for polymorphism

class Square(object):
    def draw(self, canvas): pass

class Circle(object):
    def draw(self, canvas): pass

shapes = [Square(), Circle()]
for shape in shapes:
    shape.draw('canvas')

9.6. Protocols or emulation

https://docs.python.org/3/reference/datamodel.html

Это переопределение скрытых методов, которые позволяют использовать класс в конструкциях.

Protocol	Methods	Supports syntax
Sequence	slice in getitem etc.	seq[1:2]
Iterators	__iter__, next	for x in coll:
Comparision	__eq__, gt etc.	x == y, x > y
Numeric	__add__, sub, and, etc.	x+y, x-y, x&y ..
String like	__str__, unicode, repr	print(x)
Attribute access	__getattr__, setattr	obj.attr
Context managers	__enter__, exit	with open('a.txt') as f:f.read()

9.7. private and protected

public - all
Protected: _property
Provate: __property

9.8. object

object() or object - base for all clases

dir(object())

['class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook']

__dict__ − Dictionary containing the class's namespace.
__doc__ - docstring
__init__ - constructor
__str__ - toString() - Return a string version of object
__name_ - Class name
__module__ - Module name in which the class is defined. This attribute is "main" in interactive mode.
__bases__ − A possibly empty tuple containing the base classes, in the order of their occurrence in the base class list.
__hash__' - hashcode()
__repr__ - string printable representation of an object

9.9. Singleton

simple
отложенный
Singleton на уровне модуля - Все модули по умолчанию являются синглетонами

9.9.1. example

class Singleton(object):
    def __new__(cls):
        if not hasattr(cls, 'instance'):
            cls.instance = super(Singleton, cls).__new__(cls)
        return cls.instance
# Отложенный экземпляр в Singleton
class Singleton:
    __instance = None
    def __init__(self):
        if not Singleton.__instance:
            print(" __init__ method called..")
        else:
            print("Instance already created:", self.getInstance())
    @classmethod
    def getInstance(cls):
        if not cls.__instance:
            cls.__instance = Singleton()
        return cls.__instance

9.9.2. шаблон Monostate

чтобы экземпляры имели одно и то же состояние

class Borg:
   __shared_state = {"1": "2"}
   def __init__(self):
      self.x = 1
      self.__dict__ = self.__shared_state
      pass
b = Borg()
b1 = Borg()
b.x = 4
print("Borg Object 'b': ", b) ## b and b1 are distinct objects
print("Borg Object 'b1': ", b1)
print("Object State 'b':", b.__dict__)## b and b1 share same state
print("Object State 'b1':", b1.__dict__)
>> ("Borg Object 'b': ", <__main__.Borg instance at 0x10baa5a70>)
>> ("Borg Object 'b1': ", <__main__.Borg instance at 0x10baa5638>)
>> ("Object State 'b':", {'1': '2', 'x': 4})
>> ("Object State 'b1':", {'1': '2', 'x': 4})

9.10. anonumous class

9.10.1. 1

class Bunch(dict): getattr, setattr = dict.get, dict.__setitem__

dict(x=1,y=2) or {'x':1,'y':2}

Bunch(dict())

9.11. replace method

class A():
    def cc(self):
        print("cc")

c = A.cc

def ff(self):
    print("ff")
    c(self)

A.cc = ff
a = A()
a.cc()

ff
cc

class A():
    def cc(self):
        print("cc")

a = A()
c = a.cc

def ff(self):
    print("ff")
    c()

A.cc = ff
a = A()
a.cc()

ff
cc

10. modules and packages

module - file
package - folder - must have: init.py to be able to import folder as a module.
__main__.py - allow to execute folder: python -m folder

module can define

functions
classes
variables
runnable code.

When a module is imported (anyhow) into a script, the code in the top-level portion of a module is executed only once.

Import whole file - обращаться с файлом -

import module1[, module2[,... moduleN]
import support   #just a file support.py

support.print_func("Zara")

Import specific thing from file to access without module

from modname import name1[, name2[, ... nameN]]
from modname import *

__name__ - name of this module.

Locating Modules:

current dir
PYTHONPATH - shell variable - list of directories
default path. On UNIX usr/local/lib/python3

build-in functions

dir(math) - list of strings containing the names defined by a module or in current
locals() - within a function, it will return all the names that can be accessed locally from that function (dictionary)
global() return dictionary type
reload(module) reexecute the top-level code of module.

To make all of your functions available when you have imported Phone:

from Pots import Pots
from Isdn import Isdn
from G3 import G3

Main

def main(args):pass
if __name__ == '__main__':  #name of module-namespace. '__main__' for - $python a.py
    import sys
    main(sys.argv)
    quit()

10.1. module special attributes (Module level "dunders") [-ʌndə(ɹ)]

__name__
__doc__
__dict__ - module’s namespace as a dictionary object
__file__ - is the pathname of the file from which the module was loaded, if it was loaded from a file.
__annotations__ - optional - dictionary containing variable annotations collected during module body execution

11. folders/files USECASES

list files and directories deepth=1: os.listdir()->list
list only files deepth=1 os.listdir() AND os.path.isfile()

12. functions

python does not support method overloading
Можно объявлять функции внутри функций
Функции видят область где они определены, а не где вызваны.
Если функция ничего не возвращает, то возвращает None
Функция может возвращать return a, b = (a,b) котороые присваиваются нескольким переменным : a,b = c()

12.1. by value or by reference

by value:

immutable:
- strings
- integers
- tuples
- others…

by reference:

muttable:
- objects
- lists, sets, dicts

12.2. Types of Аргументы функции

Positional arguments (first, second, third=None, fourth=None) (first, second) - positional, (third, fourth) - Keyword arguments
Keyword arguments - printinfo( age = 50, name = "miki" ) - order does not metter
Default arguments - def printinfo( name, age = 35 ):
Variable-length or Arbitrary Argument Lists positional arguments

def printinfo( arg1, *vartuple ):
  for var in vartuple:
     print (var)
printinfo (1, 'asd','d31', 'cv')

Variable-length or Arbitrary Argument Lists Keyword arguments

def save_ranking(**kwargs):
  print(kwargs)
save_ranking(first='ming', second='alice', fourth='wilson', third='tom', fifth='roy')
>>> {'first': 'ming', 'second': 'alice', 'fourth': 'wilson', 'third': 'tom', 'fifth': 'roy'}

both

def save_ranking(*args, **kwargs):
save_ranking('ming', 'alice', 'tom', fourth='wilson', fifth='roy')

12.3. example

def functionname( parameters:type ) -> return_type:
   "function_docstring"
   function_suite
   return [expression]


def readit(file :str, fun :callable) ->list:

12.4. arguments, anonymous-lambda, global variables

Anonymous Functions: - one-line version of a function

lambda [arg1 [,arg2,.....argn]]:expression
(lambda x, y: x + y)(1, 2)

global variables can be accessesd from all functions (except lambda??? - working in console)

# global Money  # Uncomment to replace local Money to global.
  Money = Money + 1 #local

12.5. attributes

User-defined function

__doc__
__name__
__qualname__
__module__
__defaults__
__code__
__globals__
__dict__
__closure__
__annotations__
__kwdefaults__

Instance methods: read-only attributes:
__self__ - class instance object
__func__ - function object
__module__ - name of the module the method was defined in

12.6. function decorators

function that get one function and returning another function

when you need to extend the functionality of functions that you don't want to modify
@classmethod

Typically used to catch exceptions in wrapper

  def p_decorate(f):
     def inner(name): # wrapper
         # do something here!
         f() # we call wrapped function
     return inner

  my_get_text = p_decorate(get_text) # обертываем, теперь
  my_get_text("John") #о бертка вернет и вызовет вложенную

  #syntactic sugar
  @p_decorate
  def get_text(name):
     return "bla " + name

  #-------------
  get_text = div_decorate(p_decorate(strong_decorate(get_text)))
  # Equal to
  @div_decorate
  @p_decorate
  @strong_decorate

  #-------------- Passing arguments to decorators ------
  def tags(tag_name):
      def tags_decorator(func):
          def func_wrapper(name):
              return "<{0}>{1}</{0}>".format(tag_name, func(name))
          return func_wrapper
      return tags_decorator

  @tags("p")
  def get_text(name):
      return "Hello "+name
  def get_text(name):

12.7. build-in

https://docs.python.org/3/library/functions.html

abs(x): absolute value
all(iterable): all elements of the iterable are true or empty = true
any(iterable): any element is true or empty = false
ascii(object): printable representation of an object
breakpoint(*args, **kws): drops you into the debugger at the call site. calls sys.breakpointhook() which calls calls pdb.set_trace()
callable(object): if the object - callable type - true. (classes are callable )
@classmethod: function decorator. May be called for class C.f() or for instance C().f() For derived class derived class object is passed as the implied first argument.

class C:
   @classmethod
   def f(cls, arg1, arg2, ...): ...

compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)

into code or AST object - can be executed by exec() or eval(). Mode - 'exec' if source consists of a sequence of statements. 'eval' if it consists of a single expression

delattr(object, name)

like setattr() - delattr(x, 'foobar') is equivalent to del x.foobar.

divmod(a, b)

ab-two (non complex) numbers = quotient and remainder when using integer division

enumerate(iterable, start=0)

return iterator which returns tuple (0, arg1), (1,arg1) ..

eval(expression, globals=None, locals=None)

string is parsed and evaluated as a Python expression . The globals() and locals() functions returns the current global and local dictionary, respectively, which may be useful to pass around for use by eval() or exec().

exec(object[, globals[, locals]])

object must be either a string or a code object. Be aware that the return and yield statements may not be used outside of function definitions even within the context of code passed to the exec() function. The return value is None.

filter(function, iterable)

Construct an iterator from those elements of iterable for which function returns true.

getattr(object, name[, default])

eturn the value of the named attribute of object. name must be a string or AttributeError is raised

setattr(object, name, value)

assigns the value to the attribute, provided the object allows it

globals()

dictionary representing the current global symbol table (inside a function or method, this is the module where it is defined, not the module from which it is called)x

hasattr(object, string name)

result is True if the string is the name of one of the object’s attributes, False if not

hash(object)

Hash values are integers. Object __hash__() method.

id(object)

“identity” of an object - integer. Unique and constant during life time. Two objects with non-overlapping lifetimes may have the same id() value.

isinstance(object, classinfo)

True if object is an instance of the classinfo argument.

issubclass(class, classinfo)

true if class is a subclass of classinfo. class is considered a subclass of itself

iter(object[, sentinel])

1) Return an iterator object. __iter__() or __getitem__() 2) object must be a callable object __next__() if the value returned is equal to sentinel, StopIteration will be raised

next(iterator[, default])

__next__() If default is given, it is returned if the iterator is exhausted

len(s)

map(function, iterable, …)

Return an iterator that applies function to every item of iterable. May be applied in parallel to may iterable.

max/min(iterable, *[, key, default])

max/min(arg1, arg2, *args[, key])

largest item in an iterable or the largest of two or more arguments

memoryview(obj)

memory view” object

pow(x, y[, z])

(x** y) % z

repr(object)

__repr__() method - printable representation of an object

reversed(seq)

__reversed__() method or support sequence protocol (the __len__() method and the __getitem__()

round(number[, ndigits])

number rounded to ndigits precision after the decimal point

sorted(iterable, *, key=None, reverse=False)

sorted list [] from the items in iterable

@staticmethod

method into a static method.

sum(iterable[, start])

returns the total

super([type[, object-or-type]])

Return a proxy object that delegates method calls to a parent/parents or sibling class of type

vars([object])

__dict__ attribute for a module, class, instance, or any other object

zip(*iterables)

Make an iterator of tuples that aggregates elements from each of the iterables.

list(zip([1, 2, 3],[1, 2, 3])) = [(1, 1), (2, 2), (3, 3)]
unzip: list(zip(*zip([1, 2, 3],[1, 2, 3]))) = [(1, 2, 3), (1, 2, 3)]

__import__(name, globals=None, locals=None, fromlist=(), level=0)

not needed in everyday Python programming

class bool([x]): standard truth testing procedure see 6.4
class bytearray([source[, encoding[, errors]]]): -mutable If it is a string, you must also give the encoding - it will use str.encode()
class bytes([source[, encoding[, errors]]]): -immutable
class complex([real[, imag]]): complex('1+2j'). - default - 0j
class dict(**kwarg): dict(one=1, two=2, three=3) = {'one': 1, 'two': 2, 'three': 3}; dict([('two', 2), ('one', 1), ('three', 3)])
class dict(mapping, **kwarg): ????
class dict(iterable, **kwarg): dict(zip(['one', 'two', 'three'], [1, 2, 3]))
class float([x]): from a number or string x.
class frozenset([iterable]): see 6.3.
class int([x]): x.__int__() or x.__trunc__().
class int(x, base=10): .
class list([iterable]): .
class object: Return a new featureless object.
class property(fget=None, fset=None, fdel=None, doc=None)
class range(stop)
class range(start, stop[, step]): immutable sequence type
class set([iterable]): .
class slice(stop): .
class str(object=''): .
class str(object=b'', encoding='utf-8', errors='strict'): .
tuple([iterable]): .
class type(object): object.__class__
class type(name, bases, dict): .

input([prompt]): return input input from stdin.
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None): Open file and return a corresponding file object.
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False): to file or sys.stdout
dir([object]): list of valid attributes for that object. or list of names in the current local scope. __dir__() - method called - dir() - Is supplied primarily as a convenience for use at an interactive prompt
help([object]): built-in help system
locals(): the current local symbol table

bin(x): bin(3) -> '0b11'
chr(i): Return the string representing a character = i - Unicode code
hex(x): hex(255) = '0xff'
format(value[, format_spec]): https://docs.python.org/3/library/string.html#formatspec
oct(x): Convert an integer number to an octal string prefixed with “0o”.
ord(c): c - string representing one Unicode character. Return integer.

12.8. Closure

def compose_greet_func(name):
    def get_message():
        return "Hello there "+name+"!"

    return get_message

greet = compose_greet_func("John")
print(greet())

12.9. overloading

from functools import singledispatch

@singledispatch
def func(arg1, arg2):
    print("default implementation of func - ", arg1, arg2)

@func.register
def func_impl_1(arg1: str, arg2):
    print("Implementation of func with first argument as string - ", arg1, arg2)

@func.register
def func_impl_2(arg1: int, arg2):
    print("Implementation of func with first argument as int - ", arg1, arg2)


func(1, "hello")
func("test", "hello")
func(1.34, "hi")

Implementation of func with first argument as int -  1 hello
Implementation of func with first argument as string -  test hello
default implementation of func -  1.34 hi

13. asterisk(*)

For multiplication and power operations.
- 2*3 = 6
- 2**3 = 8
For repeatedly extending the list-type containers.
- (0,) * 100
For using the variadic arguments. "Packaging" - def save_ranking(*args, **kwargs):
- *args - tuple
- **kwargs - dict
For unpacking the containers.(so-called “unpacking”) чтобы передать список в variadic arguments

def product(*numbers):
product(*[2, 3, 5, 7, 11, 13])

for arguments of function. all after * - keyword ony, after / - positional or keyword only

def another_strange_function(a, b, /, c, *, d):

14. with

with ContexManager() as c1, ContexManager() as c2:

14.1. Context manager class TEMPLATE

class DatabaseConnection(object):
    def __enter__(self):
        # make a database connection and return it
        ...
        return self.dbconn

    def __exit__(self, exc_type, exc_val, exc_tb):
        # make sure the dbconnection gets closed
        self.dbconn.close()

15. Operators and control structures

Ternary operation: a if condition else b

15.1. basic

Arithmetic

+ - *
/ - 9/2 = 4,5 - Division
% - 9%2 = 1 - Modulus - returns remainder
** - Exponent
// - Floor Division 9 //2 = 4 -9/2 = -5
+= -= *= /= %= **= //=

Comparison = ! <> > < >= <=

Bitwise

&
|
^ - XOR
~ - ~a = 1100 0011
<< - a<<2 = 1111 0000
>>

Logical - AND - OR - NOT

Membership - in, not in

Identity Operators ( point to the same object) - is, is not

15.2. Operator Precedence (Приоритет) ˈpresədəns

https://docs.python.org/3/reference/expressions.html#operator-precedence

Binding or parenthesized expression, list display, dictionary display, set display
- (expressions…),
- [expressions…], {key: value…}, {expressions…}
Subscription, slicing, call, attribute reference
- x[index], x[index:index], x(arguments…), x.attribute
await x - Await expression
** - Exponentiation [5]
+x, -x, ~x - Positive, negative, bitwise NOT
*, @, , /, % - Multiplication, matrix multiplication, division, floor division, remainder [6]
+, - - Addition and subtraction
<<, >> - Shifts
& - Bitwise AND
^ - Bitwise XOR
| - Bitwise OR
in, not in, is, is not, <, <=, >, >=, !=, == - Comparisons, including membership tests and identity tests
not x - Boolean NOT
and - Boolean AND
or - Boolean OR
if – else - Conditional expression
lambda - Lambda expression
:= - Assignment expression

old:

**
~ + - unary
* / % //
+ -
>> <<
&
^ |
<= < > >=
<> = ! Equality operators
= %= /= //= -= += *= **= Assignment operators
is is not
in not in
not or and - Logical operators

15.3. value unpacking

x=("v1", "v2")
a,b = x
print a,b
# v1 v2

T=(1,)
b,=T
# b= 1

15.4. if, loops

if expression1:
    statement(s)
elif statement(s):
    statement(s)

while expression:
   statement(s)

while count < 5:
   print count, " is  less than 5"
   count = count + 1
else:  # when the condition becomes false or at the end
   print count, " is not less than 5"

for iterating_var in sequence:
   statements(s)
else: # when no break encountered
   print num, 'is a prime number'


break # Terminates the loop
continue # skip the remainder
pass # null operation - just stupid empty operator - nothing else.

#Compcat loops, double loop
[print(x,y) for x in range(1000) for y in range(x, len(range(1000)))]
[g for g in [x['whole_word_timestamps'] for x in whisper_stable_result]] # list created everyloop

for item in array: array2.append (item)

15.5. match 3.10

command = input("What are you doing next? ")

match command.split():
    case [action]:
        ... # interpret single-verb action
    case [action, obj]:
        ... # interpret action, obj
    case ["quit"]:
        print("Goodbye!")
        quit_game()

15.6. Slicing Sequence

a[i:j] - i to j
s[i:j:k] - slice i to j with step k;

s = range(10) - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

s[-2] - = 8
s[1:] - [1, 2, 3, 4, 5, 6, 7, 8, 9]
s[1::] - [1, 2, 3, 4, 5, 6, 7, 8, 9]
s[:2] - [0, 1]
s[:-2] - [0, 1, 2, 3, 4, 5, 6, 7]
s[-2:] - [8, 9]
s[::2] - [0, 2, 4, 6, 8]
s[::-1] -[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

16. Traverse or iteration over containers

see 17.1

16.1. iterator object

Behind the scenes for statement calls iter()- iterator object

__next__() - when nothig left - raises a StopIteration exception.

#remove in loop: https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
for f in ret[:]:
  ret.remove(f)

for element in [1, 2, 3]:
    print(element)
for element in (1, 2, 3):
    print(element)
for key in {'one':1, 'two':2}:
    print(key)
for char in "123":
    print(char)
for line in open("myfile.txt"):
    print(line, end='')


class Reverse: # add iterator behavior to your classes
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

rev = Reverse('spam')
for char in rev:
    print(char)

#compact form
>>> t = {x: x*x for x in range(0, 4)}
>>> print(t)
{0: 0, 1: 1, 2: 4, 3: 9}

16.2. iterate dictionary

for key in a_dict:
for item in a_dict.items(): - tuple
for key, value in a_dict.items():
for key in a_dict.keys():
for value in a_dict.values():

Since Python 3.6, dictionaries are ordered data structures, so if you use Python 3.6 (and beyond), you’ll be able to sort the items of any dictionary by using sorted() and with the help of a dictionary comprehension:

sorted_income = {k: incomes[k] for k in sorted(incomes)}
sorted() - sort keys

17. The Language Reference

https://docs.python.org/3/reference/

17.1. yield and generator expression

form of coroutine

(expression comp_for) - (x*y for x in range(10) for y in range(x, x+10)) = <generator object>

Yield - используется для создания генератора. используется для создания лопа.

используется только в функции.
как return только останавливается после возврата если в лупе или в других случаях
async def - asynchronous generator - not iterable - <async_generator object -(Coroutine objects)
async gen - not implement iter and next methods

17.2. yield from

allow to

def gen_list1(iterable):
    for i in list(iterable):
        yield i

# equal to:
def gen_list2(iterable):
    yield from list(iterable)

17.3. ex

def agen():
    for n in range(1, 10):
          yield n

[1, 2, 3, 4, 5, 6, 7, 8, 9]


def a():
    for n in range(1, 3):
          yield n
def agen():
    for n in range(1, 7):
          yield from a()

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

#-------------------------
async def ticker(delay, to):
    """Yield numbers from 0 to *to* every *delay* seconds."""
    for i in range(to):
        yield i
        await asyncio.sleep(delay)

17.4. function decorator

#+name example_1

    def hello(func):
    def inner():
        print("Hello ")
        func()
    return inner

@hello
def name():
    print("Alice")

#+name exampl_2

    def star(n):
    def decorate(fn):
        def wrapper(*args, **kwargs):
            print(n*'*')
            result = fn(*args, **kwargs)
            print(result)
            print(n*'*')
            return result
        return wrapper
    return decorate


@star(5)
def add(a, b):
    return a + b


add(10, 20)

17.5. class decorator

print(f.__name__) of wrapper

print(f.__doc__) of wrapper

#+name ex1

     from functools import wraps

     class Star:
         def __init__(self, n):
             self.n = n

         def __call__(self, fn):
             @wraps(fn) # addition to fix f.__name__ and __doc__
             def wrapper(*args, **kwargs):
                 print(self.n*'*')
                 result = fn(*args, **kwargs)
                 print(result)
                 print(self.n*'*')
                 return result
             return wrapper

     @Star(5)
     def add(a, b):
         return a + b

     # or
     add = Star(5)(add)

     add(10, 20)

17.6. lines

new line

Конец строки - Unix LF, Windows CR LF, Macintosh CR - All of these forms can be used equally, regardless of platform
In Python - C conventions for newline characters - \n - ASCII LF

Comments

# - line
""" comment """ - multiline

Line joining - cannot carry a comment

if 1900 < year < 2100 and 1 <= month <= 12 \
  and 1 <= day <= 31 and 0 <= hour < 24 # Looks like a valid date

Implicit line joining

month_names = ['Januari', 'Februari', 'Maart',      #you can
               'Oktober', 'November', 'December']   #do it

Blank line - contains only spaces, tabs, formfeeds(FFor \f) and possibly a comment

17.7. Indentation

Leading whitespace (spaces and tabs)
determine the grouping of statements
TabError - if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces

Tabs are replaced - 1-7

17.8. identifier [aɪˈdentɪfaɪər] or names

[A-Za-z_(0-9 except for firest char)] - case sensitive

and unicode https://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html

Reserved classes of identifiers

_*
\_\_\*\_\_
__*

17.9. Keywords Exactly as written here:

False	await	else	import	pass
None	break	except	in	raise
True	class	finally	is	return
and	continue	for	lambda	try
as	def	from	nonlocal	while
assert	del	global	not	with
async	elif	if	or	yield

17.10. Numeric literals

integers
floating point numbers - 3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93
imaginary numbers ????? - 3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j

-1 - expression composed of the unary operator ‘-‘ and the literal 1

17.10.1. integers

17.10.2. float

floatnumber ::= pointfloat | exponentfloat
pointfloat ::= [digitpart] fraction | digitpart "."
exponentfloat ::= (digitpart | pointfloat) exponent
digitpart ::= digit (["_"] digit)*
fraction ::= "." digitpart
exponent ::= ("e" | "E") ["+" | "-"] digitpart

3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93

17.10.3. Imaginary literals

imagnumber ::= (floatnumber | digitpart) ("j" | "J")

3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j

17.11. Docstring and comments

first thing in a class/function/module

''' This is a multiline comment. '''

17.12. Simple statements

assert
pass
del
return
yield????
raise - without argument - re-raise the exception in try except
break
continue
import
global indentifiers** - tell pareser to treat identifier as global. Когда есть функция и глобальные переменные
nonlocal indentifier** - когда есть функция внутри функции. переменные в первой функции - не глобальные и не локальные

17.13. open external

if shell=True you cannot use array of arguments

17.13.1. ex

# -- 1
import os
os.system("echo Hello World")
# can no pass input
# -- 2
import os
pipe=os.popen("dir *.md")
print (pipe.read())

# -- 2
import subprocess
subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE).stdout.read()

# -- 3 old
import subprocess
subprocess.call("echo Hello World", shell=True)

# -- 4
import subprocess
print(subprocess.run("echo Hello World", shell=True))

# -- 5
import subprocess
(ls_status, ls_output) = subprocess.getstatusoutput(ls_command)

# -- 6
# returns output as byte string
returned_output = subprocess.check_output(cmd)
# using decode() function to convert byte string to string
print('Current date is:', returned_output.decode("utf-8"))

# -- 7 with timeout
import subprocess
DELAY = 10
po = subprocess.Popen(["sleep 1; echo 'asd\nasd'"], shell=True, stdout=subprocess.PIPE)
po.wait(DELAY)
print(po.stdout.read().decode('utf-8'))
print("ok")

17.13.2. links

https://docs.python.org/3/library/subprocess.html

18. The Python Standard Library

18.1. Major libs:

os - portable way of using operating system dependent functionality - files, Command Line Arguments, Environment Variables
- shutil - higher level interface for files
- glob - file lists from directory
logging
threading - multi-threading
collections - !!!
re - regular expression
math
statistics
datetime
zlib, gzip, bz2, lzma, zipfile and tarfile.
timeit - performance test
profile and pstats - tools for identifying time critical sections in larger blocks of code
doctest - module provides a tool for scanning a module and validating tests embedded in a program’s docstrings.
unittest
json
sqlite3
Internationalization supported by: gettext, locale, and the codecs package

18.2. regex - import re

import re

match

если от начала строки совпадает. Возращает объект MatchObject

fullmatch: whole string match

search

до первого вхождения в строке

compile(pattern)

"Компилирует" регулярное выражение, заданное в качестве строки в объект для последующей работы.

sub

replace substring

Флаги:

re.DOTALL - '.' в регексе означает любой символ кроме пробела, с re.DOTALL включая пробел
re.IGNORECASE

18.2.1. example

import re

regex = re.compile('[^а-яА-ЯёЁ/-//,. ]')
reg_pu = re.compile('[,]')
reg_pu2 = re.compile(r'\.([а-яА-ЯёЁ])') #.a = '. a'

s = reg_pu.sub(' ', data['naznach'])
s = reg_pu2.sub(r'. \1', s)
nf = regex.sub(' ', s).lower().split()

# -----------------
import re

s = 'asdds https://alalal.com'
m = re.search('https.*')
if m:
  sp = m.span()
  sub = s[sp[0]:sp[1]]

18.2.2. get string between substring

res = re.search("123(.*)789", "123456789) res.group(1) # 456

18.3. datetime

18.3.1. datetime to date

d.date()

18.3.2. date to datetime

18.3.3. current time

datetime.datetime.now()

.time() or date()

18.4. file object

https://docs.python.org/3/library/filesys.html

os - lower level than Python "file objects"
os.path — Common pathname manipulations
shutil — High-level file operations
tempfile — Generate temporary files and directories
Built-in function open() - returns "file object"

file object

18.5. importlib

import importlib
itertools = importlib.import_module('itertools')

g = importlib.import_module('t')
g.v
# from g import v # ERROR

19. exceptions handling

syntax errors - repeats the offending line and displays a little ‘arrow’ pointing
exceptions
- last line indicates what happened: stack traceback and ExceptionType: detail based on the type and what caused it
- exception may have exception’s argument

Words: try, except, else, finally, raise, with

BaseException - root exception
Exception - non-system-exiting exceptions are derived from this class
Warning - warnings.warn("Warning………..Message")

19.1. explanation

try:
    foo = open("foo.txt")
except IOError:
    print("error")
else: # if no exception in try block
    print(foo.read())
finally: # always
    print("finished")

19.2. traceback

two ways

import traceback
import sys

try:
    do_stuff()
except Exception:
    print(traceback.format_exc())
    # or
    print(sys.exc_info()[0])

19.3. examples

  try:
      x = int(input("Please enter a number: "))
      break
  except ValueError:
      print("Oops!  That was no valid number.  Try again...")

  except (RuntimeError, TypeError, NameError):
      pass
  except OSError as err:
      print("OS error: {0}".format(err)
      print("Unexpected error:", sys.exc_info()[0])
  except: #any . with extreme caution!
      print("B")
      raise          # re-raise the exception



  try:
      raise Exception('spam', 'eggs')
  except OSError:
      print(type(inst))    # the exception instance
      print(inst.args)     # arguments stored in .args
      print(inst)          # __str__ allows args to be printed directl
  else:
      print(arg, 'has', len(f.readlines()), 'lines')
      f.close()



  try:
  ...         result = x / y
  ...     except ZeroDivisionError:
  ...         print("division by zero!")
  ...     else:                           #no exception
  ...         print("result is", result)
  ...     finally:                        #always Even with неожиданным exception.
  ...         print("executing finally clause")


  with open("myfile.txt") as f: # f is always closed, even if a problem was encountered
      for line in f:
          print(line, end="")


        try:
            obj = self.method_number_list[method_number](image)
            self.OUTPUT_OBJ = obj.OUTPUT_OBJ
        except Exception as e:
            if hasattr(e, 'message'):
                self.OUTPUT_OBJ = {"qc": 3, "exception": e.message}
            else:
                self.OUTPUT_OBJ = {"qc": 3, "exception": str(type(e).__name__) + " : " + str(e.args)}

20. Logging

https://docs.python.org/3/library/logging.html

import logging

20.1. ways to log

loggers: logger = logging.getLogger(name) ; logger.warning("as")
root logger: logging.warning('Watch out!')

logging.basicConfig(level=logging.NOTSET)
root_logger = logging.getLogger()

logger = logging.getLogger(__name__)
logger.setLevel(logging.NOTSET)

20.2. terms

handlers: send the log records (created by loggers) to the appropriate destination.
records: log records (created by loggers)
loggers: expose the interface that application code directly uses.
Filters: provide a finer grained facility for determining which log records to output.
Formatters: specify the layout of log records in the final output.

20.3. getLogger()

Multiple calls to getLogger(name) with the same name will always return a reference to the same Logger object.

name - period-separated hierarchical value, like foo.bar.baz

20.4. stderror

deafult:

out stderr
level = WARNING

20.5. inspection

get all loggers:

[print(name) for name in logging.root.manager.loggerDict]

logger properties:

logger.level
logger.handlers
logger.filters
logger.root.handlers[0].formatter._fmt - formatter
logger.root.handlers[0].formatter.default_time_format

root logger: logging.root or logging.getLogger()

20.6. levels

CRITICAL 50
ERROR 40
WARNING 30
INFO 20
DEBUG 10
NOTSET 0

21. Collections

Abstract Base Classes https://docs.python.org/3/library/collections.abc.html

21.1. collections.Counter() - dict subclass for counting hashable objects

import collections
cnt = Counter()
cnt[word] += 1
most_common(n)

Return a list of the n most common elements and their counts from the most common to the least.

21.2. time complexity

O - provides an upper bound on the growth rate of the function.

x in c:

list - O(n)
dict - O(1) O(n)
set - O(1) O(n)

set

list - O(1) O(1)
collections.deque - O(1) O(1) - append
dict - O(1) O(n)

get

list - O(1) O(1)
collections.deque - O(1) O(1) - pop
dict - O(1) O(n)

https://wiki.python.org/moin/TimeComplexity

22. Conventions

22.1. code style, indentation, naming

https://www.python.org/dev/peps/
code style https://www.python.org/dev/peps/pep-0008/

Indentation:

4 spaces per indentation level.
Spaces are the preferred indentation method.

Limit all lines to a maximum of 79 characters.

Surround top-level function and class definitions with two blank lines.

Method definitions inside a class are surrounded by a single blank line.

Inside class:

capitalizing method names
prefixing data attribute names with a small unique string (perhaps just an underscore)
using verbs for methods and nouns for data attributes.

naming conventions

https://www.python.org/dev/peps/pep-0008/
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Class Names - CapWords convention
function names - lowercase with words separated by underscores as necessary to improve readability

22.2. 1/2 underscore

Single Underscore: PEP-0008: _single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.

Double Underscore: https://docs.python.org/3/tutorial/classes.html#private-variables

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes.
Name mangling is intended to give classes an easy way to define “private” instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private. ( as a way to ensure that the name will not overlap with a similar name in another class.)

22.3. Whitespace in Expressions and Statements

Yes: spam(ham[1], {eggs: 2})
No:  spam ( ham [ 1 ], { eggs: 2 } )
z
Yes: if x == 4: print x, y; x, y = y, x
No:  if x == 4 : print x , y ; x , y = y , x

YES:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)

def munge(input: AnyStr): ...
def munge() -> AnyStr: ...

def complex(real, imag=0.0):
return magic(r=real, i=imag)


if foo == 'blah':
    do_blah_thing()
do_one()
do_two()
do_three()

FILES = [
    'setup.cfg',
    'tox.ini',
    ]
initialize(FILES,
           error=True,
           )

No:
FILES = ['setup.cfg', 'tox.ini',]
initialize(FILES, error=True,)

22.4. naming

case sensitive

Class names start with an uppercase letter. All other identifiers start with a lowercase letter.
Starting an identifier with a single leading underscore indicates that the identifier is private = _i
two leading underscores indicates a strongly private identifier = __i
Never use the characters 'l' (lowercase letter el), 'O' (uppercase letter oh), or 'I' (uppercase letter eye) as single character variable names.

Package and Module Names - all-lowercase names. _ - не рекомендуется. C/C++ module has a leading underscore (e.g. _socket). https://peps.python.org/pep-0423/

Class Names - CapWords, or CamelCase

functions and varibles Function and varibles names should be lowercase, with words separated by underscores as necessary to improve readability.

Always use self for the first argument to instance methods.
Always use cls for the first argument to class methods.

Constants MAX_OVERFLOW

22.5. docstrings

Docstring is a first thing in a module, function, class, or method definition. ( doc special attribute).

Docstring Conventions https://peps.python.org/pep-0257/

Convs.:

Phrase ending in a period.
(""" """) are used even though the string fits on one line.
The closing quotes are on the same line as the opening quotes
There’s no blank line either before or after the docstring.
It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.
Multiline: 1. summary 2. blank 3. more elaborate description

22.5.1. ex. simple

def kos_root():
    """Return the pathname of the KOS root directory."""

def complex(real=0.0, imag=0.0):
    """Form a complex number.

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)
    """
    if imag == 0.0 and real == 0.0:
        return complex_zero

23. Concurrency

https://docs.python.org/3/library/concurrency.html Notes:

Preferred approach is to concentrate all access to a resource in a single thread and then use the queue

module to feed that thread with requests from other threads.

GIL - mutex - preventing multiple threads from executing Python bytecodes at once on multiple cores
- https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
- https://wiki.python.org/moin/GlobalInterpreterLock

coroutine (сопрограмма) - components that allow execution to be suspended and resumed, their sates are saved

23.1. select right API

problems:

CPU-Bound Program
I/O-bound problem - spends most of its time waiting for external operations

types:

multiprocessing - creating a new instance of the Python interpreter to run on each CPU and then farming out part of your program to run on it.
threading - Pre-emptive multitasking, The operating system decides when to switch tasks.
- hard to code, race conditions
one thread
Coroutines - Cooperative multitasking - The tasks decide when to give up control.
- asyncio

modules:

threading - Thread-based parallelism - fast - better for I/O-bound applications due to the Global Interpreter Lock
multiprocessing — Process-based parallelism - slow - better for CPU-bound applications
concurrent.futures - high-level interface for asynchronously executing callables ThreadPoolExecutor or ProcessPoolExecutor.
subprocess - it’s the recommended option when you need to run multiple processes in parallel or call an external program or external command from inside your Python code. spawn new processes, connect to their input/output/error pipes, and obtain their return codes
sched - event scheduler
queue - useful in threaded programming when information must be exchanged safely between multiple thread
asyncio - coroutine-based concurrency(Cooperative multitasking) The tasks decide when to give up control.

23.2. Process

from multiprocessing import Process
# not daemon don't allow to have subprocess
proc: Process = Process(target=self.perform_job, args=(job, queue), daemon=False)
proc.start()
proc.join(WAIT_FOR_THREAD)  # seconds
if proc.is_alive():
  pass

from multiprocessing.pool import Pool
def callback_result(result):
   print(result)
# Pool
executor = Pool(processes=PAGE_THREADS)  # clear leaked memory with process death
for i, fp in enumerate(filelist):
    executor.apply_async(
        page_processing, args=(i, fp, self.id_processing, self.doc_classes, self.barcodes_only),
        callback=callback_result)
executor.close()
executor.join()

23.3. threading

Daemon - daemon thread will shut down immediately when the program exits. default=False

Python (CPython) is not optimized for thread framework.You can keep allocating more resources and it will try spawning/queuing new threads and overloading the cores. You need to make a design change here:

Process based design:

Either use the multiprocessing module
Make use of rabbitmq and make this task run separately
Spawn a subprocess

Or if you still want to stick to threads:

Switch to PyPy (faster compared to CPython)
Switch to PyPy-STM (totally does away with GIL)

23.3.1. examples

ThreadPoolExecutor - many function for several workers

def get_degree1(angle):
    return a

def get_degree2(angle):
    return a

import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures1 = executor.submit(get_degree1, x) # started
    futures2 = executor.submit(get_degree2, x) # started
    data = future1.result()
    data = future1.result()

ThreadPoolExecutor - one function for several workers

def get_degree(angle):
   return a

import concurrent.futures
angles: list = []
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(get_degree, x): x for x in degrees}
    for future in concurrent.futures.as_completed(futures):
        # futures[future] # degree
        data = future.result()
        angles.append(data)

Custom thread

from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return "foo"

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, Verbose=None):
        Thread.__init__(self, group, target, name, args, kwargs)
        self._return = None
    def run(self):
        print(type(self._target))
        if self._target is not None:
            self._return = self._target(*self._args,
                                                **self._kwargs)
    def join(self, *args):
        Thread.join(self, *args)
        return self._return

twrv = ThreadWithReturnValue(target=foo, args=('world!',))

twrv.start()
print twrv.join()   # prints foo

23.3.2. syncronization

with - acquire() and release()

Lock, RLock, Condition, Semaphore, and BoundedSemaphore

Lock and RLock (recurrent version)

threading.Lock
Condition object - barrier
- cv = threading.Condition()
- cv.wait() - stop
- cv.notifyAll() - resume all in wait
Semaphore Objects - protected section

maxconnections = 5 pool_sema = BoundedSemaphore(value=maxconnections)

with pool_sema: conn = connectdb()
Barrier Objects - by number

b = Barrier(2, timeout=5) # 2 - numper of parties

b.wait()

b.wait()

23.4. multiprocessing

 def get_degree(angle):
      return a

from multiprocessing import Process, Manager
    manager = Manager()
    angles = manager.list()  # result angles!
    pool = []
    for x in degrees:
        # angles.append(get_degree(x))
        p = Process(target=get_degree, args=(x, angles))
        pool.append(p)
        p.start()
    for p2 in pool:
        p2.join()

  manager = mp.Manager()
  return_dict = manager.dict()
  jobs = []
  for i in range(len(fileslist)):
      p = mp.Process(target=PageProcessing, args=(i, return_dict, fileslist[i],))
      jobs.append(p)
      p.start()

  for proc in jobs:
      proc.join() # ждем завершение каждого

23.5. asyncio

IO-bound and high-level structured network code. synchronize concurrent code;

Any function that calls await needs to be marked with async.

async as a flag to Python telling it that the function about to be defined uses await.

async with statement, which creates a context manager from an object you would normally await.

cons:

all of the advantages of cooperative multitasking get thrown away if one of the tasks doesn’t cooperate.

asyncio.run - ideally only be called once

23.5.1. Core terms:

Event Loop - low level the core of every asyncio application, high level: asyncio.run()
Coroutines - (async def statement or generator iterator)
awaitable object - used for await …

23.6. асинхронного программирования (asyncio, async, await)

23.6.1. run - simple

just create new loop and execute one task in it

with Runner(debug=debug) as runner:
       return runner.run(main)

import time
start_time = time.time()

import asyncio
async def main():
    await asyncio.sleep(2)
    print('hello')
    return 2

print (asyncio.run(main()))
print("--- %s seconds ---" % (time.time() - start_time))
print (asyncio.run(main()))
print("--- %s seconds ---" % (time.time() - start_time))

hello
2
--- 2.0038740634918213 seconds ---
hello
2
--- 4.007648944854736 seconds ---

23.6.2. run - await

import time
start_time = time.time()
main()

import asyncio
async def main():
    print('enter')
    await asyncio.sleep(2)
    print("--- %s seconds ---" % (time.time() - start_time))
    print('hello')
    return 2

asyncio.run(main, main)
print("--- %s seconds ---" % (time.time() - start_time))

#+end_src

started at 07:58:04
hello
finished at 07:58:05
world
finished at 07:58:07

23.6.3. Runner

create loop and ContextVars -

import time
start_time = time.time()

import asyncio
async def main():
    await asyncio.sleep(2)
    print('hello')
    return 2

with asyncio.Runner() as runner:
    print (runner.run(main()))
    print("--- %s seconds ---" % (time.time() - start_time))
    print (runner.run(main()))
    print("--- %s seconds ---" % (time.time() - start_time))

hello
2
--- 2.003290891647339 seconds ---
hello
2
--- 4.006376266479492 seconds ---

23.7. example multiprocess, Threads, othe thread

    def main_processing(filelist) -> list:
        """ Multithread page processing

        :param filelist: # файлы PNG страниц PDF входящего файла
        :return: {procnum:(procnum, new_obj.OUTPUT_OBJ), ....}
        """

        # import multiprocessing as mp
        # manager = mp.Manager()
        # return_dict = manager.dict()
        # jobs = []

        # for i in range(len(filelist)):
        #     p = mp.Process(target=page_processing, args=(i, return_dict, filelist[i]))
        #     jobs.append(p)
        #     p.start()
        #
        # for proc in jobs:
        #     proc.join()

        # Threads
        import concurrent.futures
        return_dict: list = []
        with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
            futures = {executor.submit(page_processing, i, x): x for i, x in enumerate(filelist)}
            for future in concurrent.futures.as_completed(futures):
                data = future.result()
                return_dict.append(data)

        # One Thread Debug
        # from threading import Thread
        # thread: Thread = Thread(target=page_processing, args=(0, filelist[0]))
        # thread.start()
        # thread.join()

        return list(return_dict)

24. Monkey patch (modification at runtile)

instance.attribute = 23

24.1. replace method of class instance

# -- of not running class
A.f = my_f

# -- of running class
import types

def func_my(self):
    pass


border_collie = Dog()
border_collie.herd  = types.MethodType(func_my, border_collie)

24.2. inspect.getmembers() vs dict.items() vs dir()

dir() and inspect.getmembers() are basically the same,
__dict__ is the complete namespace including metaclass attributes.

24.3. ex replace function

import werkzeug.serving
import functools

def wrap_function(oldfunction, newfunction):
    @functools.wraps(oldfunction)
    def run(*args): #, **kwargs
        return newfunction(oldfunction, *args) #, **kwargs
    return run

def generate_adhoc_ssl_pair2(oldfunc, parameter=None):
    # Do some processing or something to customize the parameters to pass
    c, k = oldfunc(parameter)
    print(c, c.public_key().public_numbers())
    return c,k


werkzeug.serving.generate_adhoc_ssl_pair = wrap_function(
        werkzeug.serving.generate_adhoc_ssl_pair, generate_adhoc_ssl_pair2)

24.4. ex replace method of class

import werkzeug.serving

oldfunc = werkzeug.serving.BaseWSGIServer.__init__

def myinit(*args, **kwargs):
    # Do some processing or something to customize the parameters to pass
    oldfunc(*args, **kwargs)
    print(dir(args[0].ssl_context))

werkzeug.serving.BaseWSGIServer.__init__ = myinit

25. Performance Tips

https://wiki.python.org/moin/PythonSpeed/PerformanceTips

25.1. string

Avoid:
- out = "<html>" + head + prologue + query + tail + "</html>"
Instead, use
- out = "<html>%s%s%s%s</html>" % (head, prologue, query, tail)

25.2. loop

map(function, list)
iterator = (s.upper() for s in oldlist)

25.3. Avoiding dots…

https://docs.python.org/3.8/library/functions.html

25.4. avoid global variables

25.5. dict

wdict = {}
for word in words:
    if word not in wdict:
        wdict[word] = 0
    wdict[word] += 1

# Use:

wdict = {}
for word in words:
    try:
        wdict[word] += 1
    except KeyError:
        wdict[word] = 1

# or:
wdict = {}
get = wdict.get
for word in words:
    wdict[word] = get(word, 0) + 1

# or:
wdict.setdefault(key, []).append(new_element)

# or:
from collections import defaultdict

wdict = defaultdict(int)
for word in words:
    wdict[word] += 1

26. decorators

@property - 9.4 - function became read-only variable (getter)
@staticmethod - to static method, dont uses self
@classmethod - it receives the class object as the first parameter instead of an instance of the class. May be called for class C.f() or for instance C().f(), self.f(). Used for singleton.

Class Method

Static Method

;;

Defined as Mutable via inheritance	Immutable via inheritance
The first parameter as cls is to be taken in the class method.	not needed

Accession or modification of class state is done in a class method.
Class methods are bound to know about class and access it.	dont knew about class

26.1. ex

def d(c):
   print('d', c)

def dec_2(a):
    print('dec_2', a)
    return d


def dec_1():
    print('dec_1')
    return dec_2


@dec_1()
def f(v):
    print('f')

print('s')
f(2)

27. Assert

assert Expression[, Arguments]

If the expression is false, Python raises an AssertionError exception. Python uses ArgumentExpression as the argument for the AssertionError.

assert False, "Error here"

python.exe - The ``-O`` switch removes assert statements, the ``-OO`` switch removes both assert statements and doc strings.

28. Debugging and Profiling

https://habr.com/en/company/mailru/blog/201594/ Profiling - сбор характеристик работы программы

Ручное
- метод пристального взгляда - сложно оценить трудозатраты и результат
- Ручное - подтвердить или опровергнуть гипотезу узкого места
  - time - Unix tool
статистический statistical профайлер - через маленькие промежутки времени берётся указатель на текущую выполняемую функцию
- gprof - Unix tool C, Pascal, or Fortran77
- их не много
событийный (deterministic, event-based) профайлер - отслеживает все вызовы функций, возвраты, исключения и замеряет интервалы между этими событиями - возможно замедление работы программы в два и более раз
- Python standard library provides:
  - profile - if cProfile is not available
  - cProfile
debugging

28.1. cProfile

primitive calls - without recursion

ncalls: for the number of calls
tottime: time spent inside without subfunctions
percall: tottime/tottime
cumtime: time spent in this and all subfunctions and in recursion
percall: cumtime/ncalls

import cProfile
import re
cProfile.run('re.compile("foo|bar")', filename='restats')
#  pstats.Stats class reads profile results from a file and formats them in various ways.
# python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)

28.2. small code measure 1

python3 -m timeit '"-".join(str(n) for n in range(100))'

def test():
    """Stupid test function"""
    L = [i for i in range(100)]

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("test()", setup="from __main__ import test"))

28.3. small code measure 2

import time
start_time = time.time()
main()
print("--- %s seconds ---" % (time.time() - start_time))

28.4. breakpoint and code investigation

29. inject

https://github.com/ivankorobkov/python-inject Dependency injection

29.1. Callable

import inject
# configuration
inject.configure(lambda binder: binder.bind_to_provider('predict', lambda: predict))
# or
def my_config(binder):
  binder.bind_to_provider('predict', lambda: predict)
inject.configure(my_config)

# usage
@inject.params(predict='predict')  # param name to a binder key.
def detect_advanced(self, predict=None) -> (int, any):

29.2. links

https://github.com/ivankorobkov/python-inject

30. BUILD and PACKAGING

setup.py - dustutils and setuptools (based on) was most widely used approach. Since PEP 517, PEP 518 - pyproject.toml is recommended format for package.

30.1. build tools:

frontend - read pyproject.toml

pip
build
gpep517 - gentoo tool https://github.com/projg2/gpep517
hatch

backend - defined in [build-system]->build-backend, create the build artifacts, dictates what additional information is required in the pyproject.toml file

Hatch or Hatchling
setuptools
Flit
PDM

30.1.1. hatchling

backend and frontend

hatch build /path/to/project

links

30.1.2. setuptools

build backend

collection of enhancements to the Python distutils that allow you to more easily build and distribute Python distributions, especially ones that have dependencies on other packages.

defines the dependencies for a single project, Requirements Files are often used to define the requirements for a complete Python environment.

It is not considered best practice to use install_requires to pin dependencies to specific versions, or to specify sub-dependencies (i.e. dependencies of your dependencies).

ex setup.cfg

install_requires=[
   'A>=1,<2', # not allow v2
   'B>=2'
]

old way
install
- python setup.py build
- python setup.py install –install-lib ~/.local/lib/python3.10/site-packages/
links

30.1.3. gpep517

a minimal tool to aid building wheels for Python packages

gpep517 build-wheel --backend setuptools.build_meta --output-fd 3 --wheel-dir /var/tmp/portage/dev-python/flask-2.3.2/work/Flask-2.3.2-python3_11/wheel
gpep517 install-wheel --destdir=/var/tmp/portage/dev-python/flask-2.3.2/work/Flask-2.3.2-python3_11/install --interpreter=/usr/bin/python3.11 --prefix=/usr --optimize=all /var/tmp/portage/dev-python/flask-2.3.2/work/Flask-2.3.2-python3_11/wheel/Flask-2.3.2-py3-none-any.whl

commands:

get-backend: to read build-backend from pyproject.toml (auxiliary command).
build-wheel: to call the respeective PEP 517 backend in order to produce a wheel.
install-wheel: to install a wheel into the specified directory,
install-from-source: that combines building a wheel and installing it (without leaving the artifacts),
verify-pyc: to verify that the .pyc files in the specified install tree are correct and up-to-date.

links
- https://pypi.org/project/gpep517/
- https://github.com/projg2/gpep517

30.2. toml format for pyproject.toml

Tom's Obvious Minimal Language

30.2.1. basic

\b - backspace (U+0008)
\t - tab (U+0009)
\n - linefeed (U+000A)
\f - form feed (U+000C)
\r - carriage return (U+000D)
\" - quote (U+0022)
\\ - backslash (U+005C)
\uXXXX - unicode (U+XXXX)
\UXXXXXXXX - unicode (U+XXXXXXXX)

# This is a TOML comment
str1 = "I'm a string."
str2 = "You can \"quote\" me."
str3 = "Name\tJos\u00E9\nLoc\tSF."

str1 = """
Roses are red
Violets are blue"""

str2 = """\
  The quick brown \
  fox jumps over \
  the lazy dog.\
  """

# Literal strings - No escaping is performed so what you see is what you get
path = 'C:\Users\nodejs\templates'
path2 = '\\User\admin$\system32'
quoted = 'Tom "Dubs" Preston-Werner'
regex = '<\i\c*\s*>'

# multi-line literal strings
re = '''I [dw]on't need \d{2} apples'''
lines = '''
The first newline is
trimmed in raw strings.
All other whitespace
is preserved.
'''

30.2.2. integers

# integers
int1 = +99
int2 = 42
int3 = 0
int4 = -17

# hexadecimal with prefix `0x`
hex1 = 0xDEADBEEF
hex2 = 0xdeadbeef
hex3 = 0xdead_beef

# octal with prefix `0o`
oct1 = 0o01234567
oct2 = 0o755

# binary with prefix `0b`
bin1 = 0b11010110

# fractional
float1 = +1.0
float2 = 3.1415
float3 = -0.01

# exponent
float4 = 5e+22
float5 = 1e06
float6 = -2E-2

# both
float7 = 6.626e-34

# separators
float8 = 224_617.445_991_228

# infinity
infinite1 = inf # positive infinity
infinite2 = +inf # positive infinity
infinite3 = -inf # negative infinity

# not a number
not1 = nan
not2 = +nan
not3 = -nan

30.2.3. Dates and Times

# offset datetime
odt1 = 1979-05-27T07:32:00Z
odt2 = 1979-05-27T00:32:00-07:00
odt3 = 1979-05-27T00:32:00.999999-07:00

# local datetime
ldt1 = 1979-05-27T07:32:00
ldt2 = 1979-05-27T00:32:00.999999

# local date
ld1 = 1979-05-27

# local time
lt1 = 07:32:00
lt2 = 00:32:00.999999

30.2.4. array and table

Key/value pairs within tables are not guaranteed to be in any specific order.
only contain ASCII letters, ASCII digits, underscores, and dashes (A-Za-z0-9_-). Note that bare keys are

allowed to be composed of only ASCII digits, e.g. 1234, but are always interpreted as strings.

Quoted keys

key = # INVALID
first = "Tom" last = "Preston-Werner" # INVALID
1234 = "value"
"127.0.0.1" = "value"

= "no key name"  # INVALID
"" = "blank"     # VALID but discouraged
'' = 'blank'     # VALID but discouraged

fruit.name = "banana"     # this is best practice
fruit. color = "yellow"    # same as fruit.color
fruit . flavor = "banana"   # same as fruit.flavor

# DO NOT DO THIS - Defining a key multiple times is invalid.
name = "Tom"
name = "Pradyun"
# THIS WILL NOT WORK
spelling = "favorite"
"spelling" = "favourite"

# This makes the key "fruit" into a table.
fruit.apple.smooth = true
# So then you can add to the table "fruit" like so:
fruit.orange = 2

# THE FOLLOWING IS INVALID
fruit.apple = 1
fruit.apple.smooth = true

integers = [ 1, 2, 3 ]
colors = [ "red", "yellow", "green" ]
nested_arrays_of_ints = [ [ 1, 2 ], [3, 4, 5] ]
nested_mixed_array = [ [ 1, 2 ], ["a", "b", "c"] ]
string_array = [ "all", 'strings', """are the same""", '''type''' ]

# Mixed-type arrays are allowed
numbers = [ 0.1, 0.2, 0.5, 1, 2, 5 ]
contributors = [
  "Foo Bar <foo@example.com>",
  { name = "Baz Qux", email = "bazqux@example.com", url = "https://example.com/bazqux" }
]
integers2 = [
  1, 2, 3
]

integers3 = [
  1,
  2, # this is ok
]

[table-1]
key1 = "some string"
key2 = 123

[table-2]
key1 = "another string"
key2 = 456

[a.b.c]            # this is best practice
[ d.e.f ]          # same as [d.e.f]
[ g .  h  . i ]    # same as [g.h.i]
[ j . "ʞ" . 'l' ]  # same as [j."ʞ".'l']

30.3. pyproject.toml

consis of

[build-system] - pep-0517
[project] - pep 621 https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#declaring-project-metadata
- dependencies - pep 0631
[project.urls]
[project.scripts], [project.gui-scripts], and [project.entry-points] - entryproins
[project.optional-dependencies]
[tool] - pep 518 https://packaging.python.org/en/latest/specifications/declaring-build-dependencies/#declaring-build-dependencies

folder structure https://packaging.python.org/en/latest/tutorials/packaging-projects/

30.3.1. [build-system]

Hatch

requires = ["hatchling"]
build-backend = "hatchling.build"

setuptools

requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

Flit

requires = ["flit_core>=3.4"]
build-backend = "flit_core.buildapi"

PDM

requires = ["pdm-backend"]
build-backend = "pdm.backend"

30.3.2. metadata [project] and [project.urls]

pep 621 - [project] and https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#declaring-project-metadata

[project]
name = "example_package_YOUR_USERNAME_HERE"
version = "0.0.1"
authors = [
  { name="Example Author", email="author@example.com" },
] # optional?
description = "A small example package"
readme = "README.md"
license = {file = "LICENSE.txt"} # optional
keywords = ["egg", "bacon", "sausage", "tomatoes", "Lobster Thermidor"] # optional
requires-python = ">=3.7"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]
dependencies = [
  "httpx",
  "gidgethub[httpx]>4.0.0",
  "django>2.1; os_name != 'nt'",
  "django>2.0; os_name == 'nt'",
] # optional

[project.optional-dependencies]
gui = ["PyQt5"]
cli = [
  "rich",
  "click",
]


[project.urls]
"Homepage" = "https://github.com/pypa/sampleproject"
"Bug Tracker" = "https://github.com/pypa/sampleproject/issues"

[project.scripts]
spam-cli = "spam:main_cli"

30.3.3. [project.scripts]

mycmd = mymod:main

would create a command mycmd launching a script like this:

import sys
from mymod import main
sys.exit(main())

main should return 0

links
- https://packaging.python.org/en/latest/specifications/entry-points/#entry-points
- example of main for cli https://docs.python.org/3/library/__main__.html

30.3.4. dependencies

https://peps.python.org/pep-0508/

30.4. build

python3 -m build

create: dist/

├── example_package_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl - built distribution with binaries
└── example_package_YOUR_USERNAME_HERE-0.0.1.tar.gz - source distribution

30.5. distutils (old)

package has been deprecated in 3.10 and will be removed in Python 3.12. Its functionality for specifying package builds has already been completely replaced by third-party packages setuptools and packaging, and most other commonly used APIs are available elsewhere in the standard library (such as platform, shutil, subprocess or sysconfig).

https://docs.python.org/3/distutils/

30.6. terms

Source Distribution (or “sdist”) - generated using python setup.py sdist.
Wheel - A Built Distribution format
build - is a PEP 517 compatible Python package builder.
- pep517 - new style of source tree based around the pep518 pyproject.toml + [build-backend]
setup.py-style - de facto specification for "source tree"
src-layout - not flat layout. selected for package folder structure. pep 660

types of artifacts:

The source distribution (sdist): python3 -m build –sdist source-tree-directory
The built distributions (wheels): python3 -m build –wheel source-tree-directory
- no compilation required during install:

30.7. recommended

dapendency management:

pip with –require-hashes and –only-binary :all:
virtualenv or venv
pip-tools, Pipenv, or poetry
wheel project - offers the bdist_wheel setuptools extension
buildout: primarily focused on the web development community
Spack, Hashdist, or conda: primarily focused on the scientific community.

package tools

setuptools
build to create Source Distributions and wheels.
cibuildwheel - If you have binary extensions and want to distribute wheels for multiple platforms
twine - for uploading distributions to PyPI.

30.8. Upload to the package distribution service

30.8.1. TODO twine

twine upload dist/package-name-version.tar.gz dist/package-name-version-py3-none-any.whl

30.8.2. TODO Github actions

30.9. editable installs PEP660

pip install --editable

editable installation mode - installation of projects in such a way that the python code being imported remains in the source directory

Python programmers want to be able to develop packages without having to install (i.e. copy) them into site-packages, for example, by working in a checkout of the source repository.

Actualy just add directories to PYTHONPATH.

there is 2 types of wheel now: normal and "editable".

30.10. PyPi project name, name normalization and other specifications

names should be ASCII alphabet, ASCII numbers. ., -, and _ allowed, but normalized to -.

normalized to
lowercase

Valid non-normalized names: ^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$

Normalization: re.sub(r"[-_.]+", "-", name).lower()

Source distribution format - pep-0517 PEP 518

Source distribution file name: {name}-{version}.tar.gz
contains a single top-level directory called {name}-{version} (e.g. foo-1.0), containing the source files of the package.
directory must also contain
- a pyproject.toml
- PKG-INFO file containing metadata - PEP 566

30.10.1. links

30.11. TODO src layout vs flat layout

src layout helps

prevent accidental usage of the in-development copy of the code
https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/
https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure%3E

30.12. links

main https://packaging.python.org/en/latest/
python key projects https://packaging.python.org/en/latest/key_projects/
build systems recommended (officla) https://packaging.python.org/en/latest/guides/tool-recommendations/
gentoo https://blogs.gentoo.org/mgorny/2021/11/07/the-future-of-python-build-systems-and-gentoo/
PEP 517 – A build-system independent format for source trees https://peps.python.org/pep-0517/
PEP 518 Specifying Minimum Build System Requirements for Python Projects https://peps.python.org/pep-0518/
PEP 621 Storing project metadata in pyproject.toml - https://peps.python.org/pep-0621/
specifications https://packaging.python.org/en/latest/specifications/
pip default installer https://peps.python.org/pep-0453/

31. setuptools - build system

32. pip (package manager)

Устанавливается вместе с Python

(pip3 for Python 3) by default - MIT -
pip.pypa.io

Some package managers, including pip, use PyPI as the default source for packages and their dependencies.

Python Package Index - official third-party software repository for Python

PyPI (ˌpaɪpiˈaɪ)

32.1. release steps

register at pypi.org
https://pypi.org/manage/account/#api-tokens
github->project->Secrets and variables->actions
- New repostitory secret
- PYPI_API_TOKEN
- token from 2)
github->project->Actions->add->Publish Python Package

32.1.1. links

https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

32.2. wheels

“Wheel” is a built, archive format that can greatly speed installation compared - .whl

to disable wheel:

–no-cache-dir
–no-binary=:all:

32.3. virtualenv

Может быть так, что проект А запрашивает версию 1.0.0, в то время как проект Б запрашивает более новую версию 2.0.0, к примеру.

не может различать версии в каталоге «site-packages»

pip install virtualenv

32.4. venv

создать:

python -m venv /path/to/new/virtual/environment

pyvenv.cfg - created
bin (or Scripts on Windows) containing a copy/symlink of the Python binary/binaries
в директории с интерпретатором или уровнем выше ищется файл с именем pyvenv.cfg;
если файл найден, в нём ищется ключ home, значение которого и будет базовой директорией;
в базовой директории идёт поиск системной библиотеки (по спец. маркеру os.py);

Использовать:

source bin/activate
./bin/python main.py

32.5. update

pip3 install –upgrade pip –user

устаревшие: pip3 list –outdated
обновить: pip3 install –upgrade SomePackage

32.6. requirements.txt

Как установить

pip install -r requirements.txt

Как создать

pip freeze > requirements.txt - Создать на основе всех установленных библиотек
pipreqs . - на основе импортов - требует установку pip3 install pipreqs –user

Смотреть на кроссплатформенность! Не все библиотеки такие!

docopt == 0.6.1             # Version Matching. Must be version 0.6.1
keyring >= 4.1.1            # Minimum version 4.1.1
coverage != 3.5             # Version Exclusion. Anything except version 3.5
Mopidy-Dirble ~= 1.1        # Compatible release. Same as >= 1.1, == 1.*

# without version:
nose
nose-cov
beautifulsoup4

32.7. errors

Traceback (most recent call last): File "/usr/bin/pip3", line 9, in <module> from pip import main ImportError: cannot import name 'main'

SOLVATION: alias pip3="home/u2.local/bin/pip3"

32.8. cache dir

to reduce the amount of time spent on duplicate downloads and builds.

cached:
- http responses
- Locally built wheels
pip cache dir

32.8.1. links

https://pip.pypa.io/en/latest/topics/caching/

32.9. hashes

pip install package –require-hashes
Requirements must be pinned with ==
weak hashes: md5, sha1, and sha224
python -m pip download –no-binary=:all: SomePackage
python -m pip hash –algorithm sha512 ./pip_downloads/SomePackage-2.2.tar.gz
pip install –force-reinstall –no-cache-dir –no-binary=:all: –require-hashes –user -r requirements.txt

FooProject == 1.2 –hash=sha256:2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 \ –hash=sha256:486ea46224d1bb4fb680f34f7c9ad96a8f24ec88be73ea8e5a6c65260e9cb8a7

32.10. add SSL certificate

export PIP_CERT=/etc/ssl/certs/rnb.pem

Dockerfile:

COPY /etc/ssl/certs/rnb.pem /rnb.pem
ENV PIP_CERT=/rnb.pem

32.10.1. crt(not working)

pip config set global.cert path/to/ca-bundle.crt
pip config list
conda config –set ssl_verify path/to/ca-bundle.crt
conda config –show ssl_verify

git config –global http.sslVerify true
git config –global http.sslCAInfo path/to/ca-bundle.crt

https://stackoverflow.com/questions/39356413/how-to-add-a-custom-ca-root-certificate-to-the-ca-store-used-by-pip-in-windows

32.10.2. pem(not working)

pip config set global.cert /home/RootCA3.pem - указываем путь к самоподписномму серту, если возникают ошибки установки модулей питона.

python -c "import ssl; print(ssl.get_default_verify_paths())"
add pem to path

32.11. ignore SSL certificates

pip install –trusted-host pypi.org –trusted-host files.pythonhosted.org <package_name>

32.12. links

https://pip.pypa.io/en/latest/topics/secure-installs/

33. urllib3 and requests library

requests->urllib3->http.client

request parametes:

data - body with header: Content-Type: applicantion/x-www-form-urlencoded
params - ?param=value - urllib.quote(string)

33.1. difference

speed - I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request.

33.2. see raw request

33.2.1. requests

1) after request:

hello, as!

p = requests.post(f'http://127.0.0.1:8081/transcribe/{rid}/find_sentence', params={'sentences': sentences})
print("----request:")
[print(x) for x in p.request.__dict__.items()]

2) before request

s = Session()
req = Request('GET',  url, data=data, headers=headers)
prepped = s.prepare_request(req)
[print(x) for x in prepped.__dict__.items()]

3) after request from logs:

import requests
import logging

# These two lines enable debugging at httplib level (requests->urllib3->http.client)
# You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# The only thing missing will be the response.body which is not logged.
try:
    import http.client as http_client
except ImportError:
    # Python 2
    import httplib as http_client
http_client.HTTPConnection.debuglevel = 1

# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

requests.get('https://httpbin.org/headers')

33.2.2. links

https://stackoverflow.com/questions/10588644/how-can-i-see-the-entire-http-request-thats-being-sent-by-my-python-application

33.3. problems:

33.4. links

34. pdf 2 png

34.1. pdf2image

require poppler-utils

wraps pdftoppm and pdftocairo
to PIL image

34.2. Wand

pip3 install Wand

ImageMagic binding

34.3. PyMuPDF

pip3 install PyMuPDF

35. statsmodels

35.1. ACF, PACF

from statsmodels.graphics.tsaplots import plot_acf
from matplotlib import pyplot
series = read_csv('seasonally_adjusted.csv', header=None)
plot_acf(series, lags = 150) #  lag values along the x-axis and correlation on the y-axis between -1 and 1
plot_pacf(series) # не понять. короче, то же самое, только более короткие корреляции не мешают
pyplot.show()

35.2. bar plot

loan_type_count = data['Loan Type'].value_counts()
sns.set(style="darkgrid")
sns.barplot(loan_type_count.index, loan_type_count.values, alpha=0.9)

36. XGBoost

https://github.com/dmlc/xgboost
doc https://xgboost.readthedocs.io/en/latest/
parameters tunning https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html

One natural regularization parameter is the number of gradient boosting iterations M (i.e. the number of trees in the model when the base learner is a decision tree).

36.1. usage

import xgboost as xgb

from xgboost import XGBClassifier - multi:softprob if classes > 2

for multiclass classification:

from sklearn.preprocessing import LabelBinarizer
y = np.array(['apple', 'pear', 'apple', 'orange'])
y_dense = LabelBinarizer().fit_transform(y) - [ [1 0 0],[0 0 1],[1 0 0],[0 0 1] ]

36.2. categorical columns

The politic of XGBoost is to not have a special support for categorical variables. It s up to you to manage them before providing the features to the algo.

If booster=='gbtree' (the default), then XGBoost can handle categorical variables encoded as numeric directly, without needing dummifying/one-hotting. Whereas if the label is a string (not an integer) then yes we need to comvert it.

36.2.1. Feature importance between numerical and categorical features

https://discuss.xgboost.ai/t/feature-importance-between-numerical-and-categorical-features/245

one-hot encoding. Consequently, each categorical feature transforms into N sub-categorical features, where N is the number of possible outcomes for this categorical feature.

Then each sub-categorical feature would compete with the rest of sub-categorical features and all numerical features. It is much easier for a numerical feature to get higher importance ranking.

What we can do is to set importance_type to weight and then add up the frequencies of sub-categorical features to obtain the frequency of each categorical feature.

36.3. gpu support

tree_method = 'gpu_hist'
gpu_id = 0  (optional)

36.4. result value from leaf value

The final probability prediction is obtained by taking sum of leaf values (raw scores) in all the trees and then transforming it between 0 and 1 using a sigmoid function. (1 / (1 + math.exp(-x)))

leaf = 0.1111119 #  raw score
result = 1/(1+ np.exp(-(leaf))) = 0.5394 # probability score -  logistic function

xgb.plot_tree(bst, num_trees=num_round-1) # default 0 tree

print(bst.predict(t, ntree_limit=1)) # first 0 tree, default - all

36.5. terms

instance or entity - line
feature - column
data - list of instances - 2D
labels - 1D list of labels for instances

36.6. xgb.DMatrix

LibSVM text format file
Comma-separated values (CSV) file
NumPy 2D array
SciPy 2D sparse array
cuDF DataFrame
Pandas data frame, and
XGBoost binary buffer file.

data = np.random.rand(5, 10)  # 5 entities, each contains 10 features
label = np.random.randint(2, size=5)  # binary target array([1, 0, 1, 0, 0])
dtrain = xgb.DMatrix(data, label=label)

# weights
w = np.random.rand(5, 1)
dtrain = xgb.DMatrix(data, label=label, missing=-999.0, weight=w)

36.6.1. LibSVM file format

1 101:1.2 102:0.03
0 1:2.1 10001:300 10002:400

Each line represent a single instance
1,0 - labels - probability values in [0,1]
101, 102 - feature indices
1.2, 0.03 - feature values

xgb.DMatrix('/home/u2/Downloads/agaricus.txt.train')
xgb.DMatrix(train.csv?format=csv&label_column=0)

36.7. parameters

https://xgboost.readthedocs.io/en/latest/parameter.html

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}

objective:

'binary:logistic' - labels [0,1] - output probability, binary

-'reg:squarederror' - regression with squared loss

multi:softmax multiclass classification using the softmax objective

'booster': 'gbtree' - gbtree and dart use tree based models while gblinear uses linear functions

eval_metric - rmse for regression, and error for classification, mean average precision for ranking

error - Binary classification #(wrong cases)/#(all cases)

'seed': 0 - random seed

gbtree

'eta': 0.3 - learning_rate
'max_depth': 6 - Maximum depth of a tree - more = more complex and more likely to overfit
'gamma': 0 - Minimum loss reduction required to make a further partition on a leaf node of the tree. - to make more coservative

36.8. print important features

import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('TkAgg')

xgb.plot_importance(bst)
plt.show()

36.9. TODO prune обрезание деревьев

36.10. permutation importance

for XGBClassificator (binary) - sklearn.inspection.permutation_importance

other - shap values

36.11. model to if-else

https://towardsdatascience.com/xgboost-deployment-made-easy-6e11f4b3f817

36.12. Errors

36.12.1. ValueError: setting an array element with a sequence.

36.12.2. label must be in [0,1] for logistic regression

37. Natasha & Yargy

pip install jupyter
pip install yargy ipymarkup - подсветка разметки
jupyter.exe notebook
graphviz и вручную настроил PATH на его bin

37.1. yargy

yarky tokenizer https://yargy.readthedocs.io/ru/latest/reference.html
yargy https://yargy.readthedocs.io/ru/latest/index.html
MIT License

Недостатки:

slow
не гибкий
нелья построить правила с условиями

37.1.1. yargy.tokenizer

from yargy.tokenizer import MorphTokenizer # используется по умолчанию
t = MorphTokenizer()
list(t('asds'))[0].value
list(t('asds'))[0].normalized

Его правила:

TokenRule('RU', '[а-яё]+'),
TokenRule('LATIN', '[a-z]+'),
TokenRule('INT', '\d+'),
TokenRule('PUNCT','[-\\/!#$%&()\[\]\*\+,\.:;<=>?@^_`{|}~№…"\'«»„“ʼʻ”]'),
TokenRule('EOL', '[\n\r]+'),
TokenRule('OTHER', '§')]

убрать часть правил: tokenizer = Tokenizer().remove_types('EOL')

37.1.2. rules

yargy.predicates- type('INT'), eq('г'), _or(normalized('ложка'), caseless('вилка')
yargy.rule - rule(predicates, …), or_
yargy.pipelines - газетти́р - список - конструктор правила
- morph_pipeline(['л','г']) - перед работой приводит слова к нормальной форме
- caseless_pipeline(['Абд Аль','и']) - перед работой приводит слова к нижнему регистру
yargy.interpretation.fact('название',['аттрибут', …]) - его используют предикаты для их интерпритации. - Интерпретация, это сварачивание дерева разбора снизу вверх.
- attribute - значение по умолчанию для аттрибута и опреации над результатом

f = fact('name',[attribute('year', 2017)])
a=eq('100').interpretation(f.year.custom(произвольная фонкция одного аргумента))
r=rule(a).interpretation(f)
match.fact or match.tree.as_dot

37.1.3. match

https://github.com/natasha/yargy/blob/master/yargy/parser.py

37.1.4. предикаты

eq(value) a == b
caseless(value) a.lower() == b.lower()
in_(value) a in b
in_caseless(value) a.lower() in b
gte(value) a >= b
lte(value) a <= b
length_eq(value) len(a) == b
normalized(value) Нормальная форма слова == value
dictionary(value) Нормальная форма слова in value
gram(value) value есть среди граммем слова
type(value) Тип токена равен value
tag(value) Тег токена равен value
custom(function[, types]) function в качестве предиката
true Всегда возвращает True
is_lower str.islower
is_upper str.isupper
is_title str.istitle
is_capitalized Слово написано с большой буквы
is_single Слово в единственном числе

Сэты:

optional()
repeatable(min=None, max=None, reverse=False)
interpretation(a.a) - прикрепляет предикат к эллементу интерпретации

37.1.5. нестандартные формы слова - рулетики

Т библиотека?
уменьшительно ласкательные приводить к стандартной офрме, словарики?

37.1.6. ex

#------- правило в виде контекстно-свободной грамматики ----
from yargy import rule
R = rule('a','b')
R.normalized.as_bnf
>> R -> 'a' 'b'
#------- FLOAT -------
from yargy import rule, or_
from yargy.predicates import eq, type as _type, in_
INT = _type('INT')
FLOAT = rule(INT, in_(',.'), INT)
FRACTION = rule(INT, eq('/'), INT)
RANGE = rule(INT, eq('-'), INT)
AMOUNT = or_(
  rule(INT),
  FLOAT,
  FRACTION,
  RANGE)
#------- MorphTokenizer -----------
from yargy.tokenizer import MorphTokenizer
TOKE = MorphTokenizer()
l = list(TOKE(text))
for i in l: print('\n'.join(map(str, i)))
#--------- findall ----------
from yargy import rule, Parser
from yargy.predicates import eq

line = '100 г'

MEASURE = rule(eq(100))
parser = Parser(MEASURE.optional())
matches=list(parser.findall(line))
# --------- Simples ------
from yargy import rule, Parser
r = rule('a','b')
parser = Parser(r)
line = 'abc'
match = parser.match(line)
# ----------- spans  show --------
from ipymarkup import markup, AsciiMarkup

spans =[_.spam for _ in matches]
for line in markup(text, spans, AsciiMarkup).as_ascii:
    print(line)

37.1.7. natasha

Extractors:

NamesExtractor - NAME,tagger=tagger
SimpleNamesExtractor - SIMPLE_NAME
PersonExtractor - PERSON, tagger=tagger
DatesExtractor - DATE
MoneyExtractor - MONEY
MoneyRateExtractor - MONEY_RATE
MoneyRangeExtractor - MONEY_RANGE
AddressExtractor - ADDRESS, tagger=tagger
LocationExtractor - LOCATION
OrganisationExtractor - ORGANISATION

37.1.8. console

https://jupyter.org/documentation

37.1.9. QT console

https://qtconsole.readthedocs.io/en/stable/
https://www.tutorialspoint.com/jupyter/ipython_introduction.htm
inline figures
proper multi-line editing with syntax highlighting
graphical calltips
emacs-style bindings for text navigation
HTML or XHTML
PNG(outer or inline) in HTML, or inlined as SVG in XHTML
Run: jupyter qtconsole –style monokai
! - system command (!dir)
? - a? - information about varible, plt?? - source definition, exit - q
In[2] - input string, Out[2] - out
display(object) display anythin supported
"*"*100500; - ; не видеть результат
Switch to SVG inline XHTML In [10]: %config InlineBackend.figure_format = 'svg'

keys
- Tab - autocompletion - Несклько раз нажать
- ``Enter``: insert new line (may cause execution, see above).
- ``Ctrl-Enter``: force new line, never causes execution.
- ``Shift-Enter``: force execution regardless of where cursor is, no newline added.
- ``Up``: step backwards through the history.
- ``Down``: step forwards through the history.
- ``Shift-Up``: search backwards through the history (like ``Control-r`` in bash).
- ``Shift-Down``: search forwards through the history.
- ``Control-c``: copy highlighted text to clipboard (prompts are automatically stripped).
- ``Control-Shift-c``: copy highlighted text to clipboard (prompts are not stripped).
- ``Control-v``: paste text from clipboard.
- ``Control-z``: undo (retrieves lost text if you move out of a cell with the arrows).
- ``Control-Shift-z``: redo.
- ``Control-o``: move to 'other' area, between pager and terminal.
- ``Control-l``: clear terminal.
- ``Control-a``: go to beginning of line.
- ``Control-e``: go to end of line.
- ``Control-u``: kill from cursor to the begining of the line.
- ``Control-k``: kill from cursor to the end of the line.
- ``Control-y``: yank (paste)
- ``Control-p``: previous line (like up arrow)
- ``Control-n``: next line (like down arrow)
- ``Control-f``: forward (like right arrow)
- ``Control-b``: back (like left arrow)
- ``Control-d``: delete next character, or exits if input is empty
- ``Alt-<``: move to the beginning of the input region.
- ``alt->``: move to the end of the input region.
- ``Alt-d``: delete next word.
- ``Alt-Backspace``: delete previous word.
- ``Control-.``: force a kernel restart (a confirmation dialog appears).
- ``Control-+``: increase font size.
- ``Control–``: decrease font size.
- ``Control-Alt-Space``: toggle full screen. (Command-Control-Space on Mac OS X)
magic
- %lsmagic - Displays all magic functions currently available
- %cd
- %pwd
- %dhist - directories you have visited in current session
- %notebook - history to into an IPython notebook file with ipynb extension
- %precision n - n after ,
- %recall n - execute preview command or n command
- %run a.py - run file, - замерить время выполнения (-t), запустить с отладчиком (-d) или профайлером (-p)
  - %run -n main.py - import
- %time command - displays time required by IPython environment to execute a Python expression
- %who type - у каких переменнх такой-то тип
- %whos - все импортированные и созданные объекты
- %hist - вся история в виде текста
- %rep n - переход на n ввод
Python
- %pdoc - документацию
- %pdef - определение функции
- %psource - исходный код функции, класса
- %pfile - полный код файла соответственно

TEMPLATE

#------ TEMPLATE ---------------
# QTconsole ----
In [1]: run -n main.py

In [2]: main()

In [3]: from yargy import rule, Parser
from yargy.predicates import eq, type as _type, normalized
MEASURE = rule(eq('НДС'))
parser = Parser(MEASURE)
for line in words:
    matches = list(parser.findall(line))
    spans = [_.span for _ in matches]
    mup(line, spans)
# main.py ------
#my
import read_json


# -- test
words :list = [] #words from file
index :int = 0
# test --

def mup(s :str, spans:list):
    """ выводит что поматчилось на строке """
    from ipymarkup import markup, AsciiMarkup
    for line in markup(s, spans, AsciiMarkup).as_ascii:
        print(line)

def work(prov :dict):
    """вызывается для каждой строки """
    text = prov['naznach']
    # -- test
    global words, index
    words.append(text)
    index +=1
    if index >5: quit()
    # test --


def main():#args):
    read_json.readit('a.txt', work) #aml_provodki.txt
#################### MAIN ##########################
if __name__ == '__main__':  #name of module-namespace.  '__main__' for - $python a.py
     #import sys
     main()#sys.argv)
     quit()

Other

#--------- yargy to graphviz ------------
from ipymarkup import markup, show_markup
spans = [_.span for _ in matches]
show_markup(line,spans)

r = rule(...
r.normalized.as_bnf


match.tree.as_dot
# ----------- случайная выборка строк для теста ----
from random import seed, sample
seed(1)
sample(lines, 20)


OR
from random import sample

for a in sample(range(0,20), 2):
    print(a)
#-------- matplotlib --------
from matplotlib import pyplot as plt
plt.plot(range(10),range(10))

37.1.10. graphviz

graphviz - https://graphviz.gitlab.io/download/ - визуализация графов https://ru.wikipedia.org/wiki/DOT_(%D1%8F%D0%B7%D1%8B%D0%BA)
Установить PATH на bin вручную
предназначен для работы внутри jupyter Notebook
pip3 install PyQt5

https://stackoverflow.com/questions/41942109/plotting-the-digraph-with-graphviz-in-python-from-dot-file

https://www.youtube.com/watch?time_continue=1027&v=NQxzx0qYgK8

m.tree.as_dot._repr_svg_() - выдает что-то для graphiz

37.1.11. IPython

38. Stanford NER - Java

Conditional Random Field (CRF)
Stanford NER https://nlp.stanford.edu/software/CRF-NER.shtml#Starting
FAQ https://nlp.stanford.edu/software/crf-faq.html
article https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175
article https://medium.com/@mohangupta13/stanford-corenlp-training-your-own-custom-ner-tagger-348195f54d97
coreNLP https://stanfordnlp.github.io/CoreNLP/index.html

38.1. train

You give the data file, the meaning of the columns, and what features to generate via a properties file.

38.2. Ttraining data

Dataturks NER tagger

39. DeepPavlov

https://deeppavlov.ai/
http://docs.deeppavlov.ai/en/latest/components/ner.html
SpaCy и DeepPavlov https://www.youtube.com/watch?v=WVhA3YpIek4
simple-intent-recognition https://medium.com/deeppavlov/simple-intent-recognition-and-question-answering-with-deeppavlov-c54ccf5339a9
Курс по NLP от DeepPavlov https://github.com/hse-aml/natural-language-processing
built on TensorFlow and Keras

Валентин Малых, Алексей Лымарь, МФТИ

агенты ведут диалог с пользователем,
у них есть скилы, которые выбираются. - это набор компонентов - spellchecker, morphanalizer, классификатор интентов
скил - their input and output should both be strings
компоненты могут объединяться в цепочку, похожую на pipeline spacy

Компоненты - могут быть вложенными:

нет синтаксич парсера
Question Answering вопросно-ответная система
NER и Slot filling
Classification
Goal-oriented bot
Spellchecker
Morphotagger

39.1. Коммандная-строка

python .\deeppavlov\deep.py interact ner_rus [-d]

взаимодействие, тестирование
ner_rus - C:\Users\Chepilev_VS\AppData\Local\Programs\Python\Python36\lib\site-packages\deeppavlov\configs\ner\ner_rus.json

39.2. вспомогательные классы

simple_vocab
- self._t2i[token] = self.count - индексы токенов
- self._i2t.append(token) - токены индексов

39.3. in code

#------------ build model and interact ---------
from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model

faq = build_model(configs.faq.tfidf_logreg_en_faq, download = True)
a = faq(["I need help"])

39.4. installation

apt install libssl-dev libncurses5-dev libsqlite3-dev libreadline-dev libtk8.5 libgdm-dev libdb4o-cil-dev libpcap-dev

wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8rc1.tgz

tar -xvzf
cd Python-3.6.8
./configure –enable-optimizations –with-ensurepip=install
make -j8
sudo make altinstall
python3.6
update-alternatives –install /usr/bin/python python /usr/bin/python3.6 1
update-alternatives –config python
python -m pip install –upgrade pip
git config –global http.proxy http://srv-proxy:8080
git clone https://github.com/deepmipt/DeepPavlov.git

ver 1

pip3.6 install virtualenv –user
~/.local/bin/virtualenv ENV
source ENV/bin/activate

var 2

python -m venv .
source bin/activate
pip install deeppavlov
ENV/bin/python

fastText

pip install git+https://github.com/facebookresearch/fastText.git#egg=fastText==0.8.22

install everything required by a specific DeepPavlov config by running:

python -m deeppavlov install <config_name>

МОИ ФИКСЫ https://github.com/vitalij23/DeepPavlov/commits/master

JSON с комментами:
- pip3.6 install jstyleson
- deeppavlov\core\common\file.py json ->jstyleson

39.5. training

we use BIO or IOB (Inside–outside–beginning) - It subdivides the in tags as either being begin-of-entity (B_X) or continuation-of-entity (I_X).

dataset

train: data for training the model;
validation: data for evaluation and hyperparameters tuning;
test: data for final evaluation of the model.

Обучение состоит из 3 элементов dataset_reader, dataset_iterator and train. Или хотя бы двух dataset and train.

dataset_reader - источник x и у

Прото-Классы dataset_iterator:

Estimator - no early stopping, safely done at the time of pipeline initialization. in both supervised and unsupervised settings
- fit()
NNModel - Обучение с учителем (supervised learning);
- in
- in_y

Обучение:

rm -r ~/.deeppavlov/models/ner_rus
cd deep
source ENV/bin/activate
python3.6 -m deeppavlov train ~/ner_rus.json

39.5.1. dataset_iterators

https://deeppavlov.readthedocs.io/en/0.1.6/apiref/dataset_iterators.html

39.6. NLP pipeline json config

https://deeppavlov.readthedocs.io/en/0.1.6/intro/config_description.html Используется core/common/registry.json

Если у компонента указать id с именем, то по этому имени можно не создавать, а сослаться на него: "ref": "id_name"

Four main sections:

dataset_reader
dataset_iterator
chainer - one required element
- in
- pipe
  - in
  - out
- out
train

"metadata": {"variables" - определеяет пути "DOWNLOADS_PATH" "MODELS_PATH" и т.д.

39.6.1. configs

ner_conll2003.json	glove
ner_conll2003_pos.json	glove
ner_dstc2.json	random_emb_mat
ner_few_shot_ru.json	elmo_embedder
ner_few_shot_ru_simulate.json	elmo_embedder
ner_ontonotes.json	glove
ner_rus.json	fasttext
slotfill_dstc2.json	nothing
slotfill_dstc2_raw.json	nothing

39.6.2. parsing anal

from deeppavlov import configs
from deeppavlov.core.commands.utils import parse_config
config_dict = parse_config(configs.ner.ner_ontonotes)
print(config_dict['dataset_reader']['data_path'])

39.6.3. json

{
  "deeppavlov_root": ".",
  "dataset_reader": { //deeppavlov\dataset_readers
    "class_name": "conll2003_reader",  //conll2003_reader.py
    "data_path": "{DOWNLOADS_PATH}/total_rus/", //папка откуда брать train.txt, valid.txt, test.txt
    "dataset_name": "collection_rus", //если папка пустая то используется ссылка внутри conll2003_reader.py
    "provide_pos": false //pos tag?
  },
  "dataset_iterator": { //deeppavlov\dataset_iterators
    //For simple batching and shuffling
    "class_name": "data_learning_iterator", //deeppavlov\core\data\data_learning_iterator.py
    "shuffle": true, //по умолчанию перемешивает List[Tuple[Any, Any]]
    "seed": 42 //seed for random shuffle
  },
  "chainer": {  //list of components - core\common\chainer.py
    "in": ["x"], //names of inputs for pipeline inference mode
    "in_y": ["y"], //names of additional inputs for pipeline training and evaluation modes
    "out": ["x_tokens", "tags"], //names of pipeline inference outputs
    "pipe": [  //
    {
      "class_name": "tokenizer",
      "in": "x", //in of chainer
      "lemmas": true, // lemmatizer enabled
      "out": "q_token_lemmas"
    },

39.6.4. examples

tokenizer

x::As a'd.234 4567 >> ['as', "a'd.234", '4567']

{
  "chainer": {
    "in": [ "x" ],
    "in_y": [ "y" ],
    "pipe": [
      {
        "class_name": "str_lower",
        "id": "lower",
        "in": [ "x" ],
        "out": [ "x_lower" ]
      },
      {
        "in": [ "x_lower" ],
        "class_name": "lazy_tokenizer",
        "out": [ "x_tokens" ]
      },
      {
        "in": [ "x_tokens" ],
        "class_name": "sanitizer",
        "nums": false,
        "out": [ "x_san" ]
      }
    ],
    "out": [ "x_san" ]
  }
}

39.7. prerocessors

sanitizer - \models\preprocessors Remove all combining characters like diacritical marks from tokens deeppavlov\models\preprocessors\sanitizer.py
- nums - Replace [0-9] - 1 и ниибет
str_lower - batch.lower()

39.7.1. tokenizers

deeppavlov\models\tokenizers

lazy_tokenizer - english nltk word_tokenize (нет параметров)
ru_tokenizer - lowercase - съедает точку вместе со словом
- stopwords - List[str]
- ngram_range - List[int] - size of ngrams to create; only unigrams are returned by default
- lemmas - default=False - whether to perform lemmatizing or not
nltk_moses_tokenizer - MosesTokenizer().tokenize - как lazy_tokenizer, если вход токены - то склеивает.
- escape = False - если True заменяет | [] < > [ ] & на '|', '[', ']', '<', '>', '[',

39.7.2. Embedder [ɪmˈbede] - Deep contextualized word reprezentation

"Words that occur in similar contexts tend to have similar meaning"
Consist of embedding matrices.
Converts every token to a vector of particular dimensionality
Vocabularies allow conversion from tokens to indices is needed to perform lookup in embeddings matrices and compute cross-entropy between predicted probabilities and target values.
Для: (eg Cosine) similarity - as a measure of semantic simularity
unsupervised learning algorithm

Classes

glove_emb - GloVe (Stanford) - by factorizing the logarithm of the corpus word co-occurrence matrix https://github.com/maciejkula/glove-python
ELMo - Embeddings from Language Models
- whole sentences as context
fastText - By default, we use 100 dimensions
- skip-gram - learns to predict using a random close-by word - skipgram models works better with subword information than cbow.
  - designed to predict the context
  - works well with small amount of the training data, represents well even rare words or phrases.
  - slow
- cbow - according to its context - uses the sum of their vectors to predict the target
  - learning to predict the word by the context. Or maximize the probability of the target word by looking at the context
  - there is problem for rare words.
  - several times faster to train than the skip-gram, slightly better accuracy for the frequent words

GloVe (Stanford)
Global Vectors for Word Representation
Goal: create a glove model X pip3 install https://github.com/JonathanRaiman/glove/archive/master.zip
- git clone https://github.com/umlkhuang/glovepy.git
- cd glovepy
- pip3.6 install numpy –user
- python3.6 setup.py install –user
glovepy
- corpus.py - Cooccurrence matrix construction tools for fitting the GloVe model.
- glovepy.py - Glove(object) - Glove model for obtaining dense embeddings from a co-occurence (sparse) matrix.
fastText skip-gram model
- https://fasttext.cc/docs/en/unsupervised-tutorial.html
- wget https://github.com/facebookresearch/fastText/archive/v0.2.0.zip
- wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
- unzip v0.2.0.zip
- make -j3
- ./fasttext skipgram -input README.md -output result/re
Without subwords: ./fasttext skipgram -input data/fil9 -output result/fil9-none -maxn 0 -ws 30 -dim 300

"class_name": "fasttext", deeppavlov\models\embedders\fasttext_embedder.py

39.8. components

simple_vocab - For holding sets of tokens, tags, or characters - \core\data\simple_vocab.py
- id - the name of the vocabulary which will be used in other models
- fit_on - out у предыдущего
- save_path - path to a new file to save the vocabulary
- load_path - path to an existing vocabulary (ignored if there is no files)
- pad_with_zeros: whether to pad the resulting index array with zeros or not
- out - indices

39.9. Models

Rule-based Models cannot be trained.
Machine Learning Models can be trained only stand alone.
Deep Learning Models can be trained independently and in an end-to-end mode being joined in a chain.

У каждой модели своя архитектура - CNN у или LSTM+CRF

39.10. speelcheking

based on context with the help of a kenlm language model

две pipeline

https://github.com/deepmipt/DeepPavlov/blob/0.1.6/deeppavlov/configs/spelling_correction/levenshtein_corrector_ru.json

Damerau-Levenshtein distance to find correction candidates
Нет тренера
- вход x разбитый на токены и в нижнем регистре
- Файла:
  1. russian_words_vocab.dict - "слово 1" - без ё
  2. ru_wiyalen_no_punkt.arpa.binary - kenlm language model?
- simple_vocab — слово\tчастота - файл 1)
- главный deeppavlov.models.spelling_correction.levenshtein.searcher_component:LevenshteinSearcherComponent
  - x_tokens -> tokens_candidates
  - words - vacabulary - файл 1)
  - max_distance = 1
  - инициализирует LevenshteinSearcher по словарю - возвращает близкие слова и дистанцию до них
  - (0, word) - для пунктуаций
  - error_probability = 1e-4 = 0.0001
  - выдает мама: [(-4,'мара'),(-8,'мама')]
- deeppavlov.models.spelling_correction.electors.kenlm_elector:KenlmElector spelling_correction\electors\kenlm_elector.py
  - 2)
  - выбирает лучший вариант с учетом 2) файла, даже с маньшим фактором от Levenshtein

https://github.com/deepmipt/DeepPavlov/blob/0.1.6/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru.json

statistic error model
"dataset_iterator": deeppavlov\dataset_iterators\typos_iterator.py наследник DataLearningIterator
"dataset_reader" :
- typos_kartaslov_reader - typos_reader.py - бумажка;бумаша;0.5
- https://raw.githubusercontent.com/dkulagin/kartaslov/master/dataset/orfo_and_typos/orfo_and_typos.L1_5.csv
Есть тренер
- вход x, у - разбиваются на токены и в нижнем регистре
- Файла:
  1. error_model.tar.gz/error_model_ru.tsv
  2. {DOWNLOADS_PATH}/vocabs
  3. ru_wiyalen_no_punkt.arpa.binary - kenlm language model?
- главный spelling_error_model наследник Estimator 1) - deeppavlov.models.spelling_correction.brillmoore.error_model:ErrorModel
  - "fit_on" - x, y
  - in - x
  - out - tokens_candidates
  - error_model_ru.tsv "лицо ло 0.060606060606060615"
  - dictionary: class russian_words_vocab DeepPavlov\deeppavlov\vocabs\typos.py - Tie tree
    - 2)
- deeppavlov.models.spelling_correction.electors.kenlm_elector:KenlmElector
  - 3)

Первый spelling_error_model

39.10.1. Tie vocabulary

Префиксное дерево - по буквам разные слова в дереве. https://ru.wikipedia.org/wiki/%D0%9F%D1%80%D0%B5%D1%84%D0%B8%D0%BA%D1%81%D0%BD%D0%BE%D0%B5_%D0%B4%D0%B5%D1%80%D0%B5%D0%B2%D0%BE

39.11. Classification

keras_classification_model - neural network on Keras with tensorflow - deeppavlov.models.classifiers.KerasClassificationModel
- cnn_model – Shallow-and-wide CNN with max pooling after convolution,
- dcnn_model – Deep CNN with number of layers determined by the given number of kernel sizes and filters,
- cnn_model_max_and_aver_pool – Shallow-and-wide CNN with max and average pooling concatenation after convolution,
- bilstm_model – Bidirectional LSTM,
- bilstm_bilstm_model – 2-layers bidirectional LSTM,
- bilstm_cnn_model – Bidirectional LSTM followed by shallow-and-wide CNN,
- cnn_bilstm_model – Shallow-and-wide CNN followed by bidirectional LSTM,
- bilstm_self_add_attention_model – Bidirectional LSTM followed by self additive attention layer,
- bilstm_self_mult_attention_model – Bidirectional LSTM followed by self multiplicative attention layer,
- bigru_model – Bidirectional GRU model.

Please, pay attention that each model has its own parameters that should be specified in config.

sklearn_component - sklearn classifiers - deeppavlov.models.sklearn.SklearnComponent

configs/classifiers:

JSON	Frame	Embedder	Dataset	Lang	model	comment
insults_kaggle.json	keras	fasttext	basic
insults_kaggle_bert.json	bert_classifier	?	basic			new 0.2.0
intents_dstc2.json	keras	fasttext	dstc2
intents_dstc2_bert.json
intents_dstc2_big.json	keras	fasttext	dstc2
intents_sample_csv.json
intents_sample_json.json
intents_snips.json	keras	fasttext	SNIPS		cnn_model
intents_snips_big.json
intents_snips_sklearn.json
intents_snips_tfidf_weighted.json
paraphraser_bert.json
rusentiment_bert.json			basic	ru
rusentiment_cnn.json	keras	fasttext	basic	ru	cnn_model
rusentiment_elmo.json	keras	elmo	basic	ru
sentiment_twitter.json	keras	fasttext	basic	ru
sentiment_twitter_preproc.json	keras	fasttext	basic	ru
topic_ag_news.json
yahoo_convers_vs_info.json	keras	elmo		en		no reader and iterator

one_hotter - in(y)out(y) - given batch of list of labels to one-hot representation

39.11.1. bert

https://github.com/google-research/bert

Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

Pre-trained representations:

context-free - word2vec or GloVe
contextual - based on the other words in the sentence
- unidirectional
- bidirectional

json:

bert_preprocessor in(x)
one_hotter in(y)
bert_classifier x y
proba2labels - probas to id
classes_vocab - id to labels

39.11.2. iterators

basic_classification_iterator - for basic_classification_reader
- Формат csv text,label\n word1,
dstc2_intents_iterator - dstc2_reader - http://camdial.org/~mh521/dstc/downloads/handbook.pdf

39.12. NER - componen

conll2003_reader dataset_reader - BIO

"data_path": - three files, namely: “train.txt”, “valid.txt”, and “test.txt”

Models:

"ner": "deeppavlov.models.ner.network:NerNetwork",
"ner_bio_converter": "deeppavlov.models.ner.bio:BIOMarkupRestorer",
"ner_few_shot_iterator": "deeppavlov.dataset_iterators.ner_few_shot_iterator:NERFewShotIterator",
"ner_svm": "deeppavlov.models.ner.svm:SVMTagger",

preprocess

ХЗ random_emb_mat deeppavlov.models.preprocessors.random_embeddings_matrix:RandomEmbeddingsMatrix
"mask": "deeppavlov.models.preprocessors.mask:Mask"

deeppavlov.models.ner.network - когда ответ после всех или для каждого

use_cudnn_rnn - true TF layouts build on - NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.
net_type - rnn
cell_type - lstm

"in": ["x_emb", "mask", "x_char_ind", "cap"],

x_emb - token of fastText

39.13. Custom component

\deeppavlov\core\common\registry.json

40. AllenNLP

https://allennlp.org
https://pytorch.org/get-started/previous-versions/
conda install pytorch=0.4.1 -c pytorch
pip install allennlp

41. spaCy

spaCy - convolutional neural network (CNN( https://en.wikipedia.org/wiki/SpaCy

https://en.wikipedia.org/wiki/Convolutional_neural_network
https://github.com/explosion/spaCy
архитектура pipeline

42. fastText

https://fasttext.cc/docs/en/options.html

By default, we use 100 dimensions

skip-gram - learns to predict using a random close-by word - skipgram models works better with subword information than cbow.
- designed to predict the context
- works well with small amount of the training data, represents well even rare words or phrases.
- slow
- better for rare slow
cbow - according to its context - uses the sum of their vectors to predict the target
- learning to predict the word by the context. Or maximize the probability of the target word by looking at the context
- there is problem for rare words.
- several times faster to train than the skip-gram, slightly better accuracy for the frequent words

./fasttext skipgram -input data/fil9 -output result/fil9 -minn 2 -maxn 5 -dim 300

dim dimensions - default 100
-minn 2 -maxn 5 - substrings contained in a word between the minimum size (minn) and the maximal size
-ws size of the context window [5]

-epoch number of epochs [5]

result

bin stores the whole fastText model and can be subsequently loaded
vec contains the word vectors, one per line for each word in the vocabulary. The first line is a header containing the number of words and the dimensionality of the vectors.

Проверка:

./fasttext nn result/fil9.bin
./fasttext analogies result/fil9.bin

42.1. install

wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
unzip v0.2.0.zip
make -j3

43. TODO rusvectores

44. Natural Language Toolkit (NLTK)

http://www.nltk.org/
API http://www.nltk.org/genindex.html
nltk.download('averaged_perceptron_tagger_ru') - russian. The NLTK corpus and module downloader.
- Корпус corpus - набор слов http://www.nltk.org/howto/corpus.html
  - nltk.corpus.abc.words() - примерн окакие слова там C:\Users\Chepilev_VS\AppData\Roaming\nltk_data
  - for w in nltk.corpus.genesis.words('english-web.txt'): print(w) - все слова
  - Plaintext Corpora
  - Tagged Corpora - ex. part-of-speech tags - (word,tag) tuples
- Tagger
- >>> nltk.download('book') - >>> from nltk.book import * - >>> text1

	corpus	standardized interfaces to corpora and lexicons
String processing	tokenize, stem	tokenizers, sentence tokenizers, stemmers
Collocation discovery	collocations	t-test, chi-squared, point-wise mutual information
Part-of-speech tagging	tag	n-gram, backoff, Brill, HMM, TnT
Machine learning	classify, cluster, tbl	decision tree, maximum entropy, naive Bayes, EM, k-means
Chunking	chunk	regular expression, n-gram, named-entity
Parsing	parse, ccg	chart, feature-based, unification, probabilistic, dependency

44.1. collocations

http://www.nltk.org/howto/collocations.html
http://www.nltk.org/api/nltk.html
Finders -
Filtering candidates
Association measures

nltk.collocations.BigramCollocationFinder

from_words([sequence of words], bigram_fdm, window_size=2)=>finder - '.', ',',':' - разделяет

AbstractCollocationFinder

nbest(funct, n)=>[] top n ngrams when scored by the given function
finder.apply_freq_filter(min_freq) - the minimum number of occurrencies of bigrams to take into consideration
finder.apply_word_filter(lambda w: w = '.' or w = ',') - Removes candidate ngrams (w1, w2, …) where any of (fn(w1), fn(w2), …) evaluates to True.

44.2. Association measures for collocations (measure functions)

bigram_measures.student_t: Student's t
bigram_measures.chi_sq: Chi-square
bigram_measures.likelihood_ratio: Likelihood ratios
bigram_measures.pmi Pointwise Mutual Information: bigram_measures.pmi
raw_freq: Scores ngrams by their frequency
(no term): ::
(no term): w2 (w2w1) (o w1) = n_xi
(no term): ~w2 (w2 o)
(no term): = n_ix TOTAL = n_xx

#(n_ii, (n_ix, n_xi), n_xx):
>>> import nltk
>>> from nltk.collocations import *
>>> bigram_measures = nltk.collocations.BigramAssocMeasures()
>>>print('%0.4f' % bigram_measures.student_t(1, (2, 2), 4))
0
>>> print('%0.4f' % bigram_measures.student_t(1, (2, 2), 8))
0.5000

44.3. Taggers

averaged_perceptron_tagger_ru http://www.nltk.org/nltk_data/
example http://www.nltk.org/_modules/nltk/tag
API http://www.nltk.org/api/nltk.tag.html

44.4. Корпус русского языка

http://www.nltk.org/nltk_data/
https://github.com/nltk/nltk/wiki/Adding-a-Corpus
http://www.ruscorpora.ru/index.html
Значение тэгов http://www.ruscorpora.ru/en/corpora-morph.html

Почему-то не показывает падежи

45. pymorphy2

https://pymorphy2.readthedocs.io/en/latest/user/grammemes.html

grammeme - Грамме́ма - один из элементов грамматической категории - граммемы: tag=OpencorporaTag('NOUN,inan,masc plur,nomn')
используется словарь http://opencorpora.org/
для незнакомых слов строятся гипотезы
полностью поддерживается буква ё
Лицензия - MIT

46. linux NLP

46.1. count max words in line of file

MAX=0; file="/path";
while read -r line; do if [[ $(echo $line | wc -w ) -gt $MAX ]]; then MAX=$(echo $line | wc -w ); fi; done < "$file"

47. fuzzysearch

pip install –force-reinstall –no-cache-dir –no-binary=:all: –require-hashes –user -r file.txt

fuzzysearch==0.7.3 --hash=sha256:d5a1b114ceee50a5e181b2fe1ac1b4371ac8db92142770a48fed49ecbc37ca4c
attrs==22.2.0 --hash=sha256:c9227bfc2f01993c03f68db37d1d15c9690188323c067c641f1a35ca58185f99

47.1. typesense

47.1.1. pip3 install typesense –user

usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead from cryptography.utils import int_from_bytes /usr/lib/python3/dist-packages/secretstorage/util.py:19: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead from cryptography.utils import int_from_bytes Collecting typesense Downloading typesense-0.15.0-py2.py3-none-any.whl (30 kB) Requirement already satisfied: requests in ..local/lib/python3.8/site-packages (from typesense) (2.28.1) Requirement already satisfied: idna<4,>=2.5 in ./.local/lib/python3.8/site-packages (from requests->typesense) (3.4) Requirement already satisfied: certifi>=2017.4.17 in ./.local/lib/python3.8/site-packages (from requests->typesense) (2022.12.7) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests->typesense) (1.26.13) Requirement already satisfied: charset-normalizer<3,>=2 in ./.local/lib/python3.8/site-packages (from requests->typesense) (2.1.1) Installing collected packages: typesense Successfully installed typesense-0.15.0

48. Audio - librosa

librosa uses soundfile and audioread for reading audio.

48.1. generic audio characteristics

Channels: number of channels; 1 for mono, 2 for stereo audio
Sample width: number of bytes per sample; 1 means 8-bit, 2 means 16-bit
Frame rate/Sample rate: frequency of samples used (in Hertz)
Frame width or Bit depth: Number of bytes for each “frame”. One frame contains a sample for each channel.
Length: audio file length (in milliseconds)
Frame count: the number of frames from the sample
Intensity: loudness in dBFS (dB relative to the maximum possible loudness)

48.2. load

default: librosa.core.load(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='kaiser_best')

sr is the sampling rate (To preserve the native sampling rate of the file, use sr=None.)
mono is the option (true/ false) to convert it into mono file.
offset is a floating point number which is the starting time to read the file
duration is a floating point number which signifies how much of the file to load.
dtype is the numeric representation of data can be float32, float16, int8 and others.
res_type is the type of resampling (one option is kaiser_best)

import librosa
y: np.array
y, sample_rate = librosa.load(filename, sr=None) # sampling rate as `sr` , y - time series
print("sample rate of original file:", sample_rate)
# -- Duration
print(librosa.get_duration(y))
print("duration in seconds", len(y)/sample_rate)


from IPython.display import Audio
Audio(data=data1,rate=sample_rate) # play audio

# --- for WAV files:
import soundfile as sf
ob = sf.SoundFile('example.wav')
print('Sample rate: {}'.format(ob.samplerate))
print('Channels: {}'.format(ob.channels))
print('Subtype: {}'.format(ob.subtype))

# --- mp3
import audioread
with audioread.audio_open(filename) as f:
    print(f.channels, f.samplerate, f.duration)

48.3. the Fourier transform - spectrum

import numpy as np
import librosa
import matplotlib.pyplot as plt

# filepath = '/home/u2/h4/PycharmProjects/whisper/1670162239-2022-12-04-16_57.mp3'
filepath = '/mnt/hit4/hit4user/gitlabprojects/captcha_fssp/app/929014e341a0457f5a90a909b0a51c40.wav'

y, sr = librosa.load(filepath)
librosa.fft_frequencies()
n_fft = 2048
ft = np.abs(librosa.stft(y[:n_fft], hop_length=n_fft + 1))

plt.plot(ft)
plt.title('Spectrum')
plt.xlabel('Frequency Bin')
plt.ylabel('Amplitude')
plt.show()

48.4. spectrogram

import numpy as np
import librosa
import matplotlib.pyplot as plt

# filepath = '/home/u2/h4/PycharmProjects/whisper/1670162239-2022-12-04-16_57.mp3'
filepath = '/mnt/hit4/hit4user/gitlabprojects/captcha_fssp/app/929014e341a0457f5a90a909b0a51c40.wav'

y, sr = librosa.load(filepath)

spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
# fig, ax = plt.figure()
plt.imshow(spec, origin="lower", cmap=plt.get_cmap("magma"))

plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.show()

48.5. log-Mel spectrogram

import numpy as np
import librosa
import matplotlib.pyplot as plt

# filepath = '/home/u2/h4/PycharmProjects/whisper/1670162239-2022-12-04-16_57.mp3'
filepath = '/mnt/hit4/hit4user/gitlabprojects/captcha_fssp/app/929014e341a0457f5a90a909b0a51c40.wav'

y, sr = librosa.load(filepath)

hop_length = 512
n_mels = 128 #  linear transformation matrix to project FFT bins
n_fft = 2048 #  samples, corresponds to a physical duration of 93 milliseconds at a sample rate of 22050 Hz
# one line mel spectrogram
S = librosa.feature.melspectrogram(y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels)
# 3 lines mel spectrogram
fft_windows = librosa.stft(y, n_fft=n_fft, hop_length=hop_length)
magnitude = np.abs(fft_windows)**2
mel = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels)
S2 = mel.dot(magnitude)
assert (S2 == S).all()

S = np.log10(S) # Log

mel_spect = librosa.power_to_db(S, ref=np.max)
plt.imshow(mel_spect, origin="lower", cmap=plt.get_cmap("magma"))

plt.colorbar(format='%+2.0f dB')
plt.title('Mel Spectrogram')
plt.show()

48.6. distinguish emotions

male = librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13)
male = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13), axis=0)

48.7. links

49. Audio

49.1. terms

down-mixing - The process of combining multiple audio output channels into a single stereo or mono output
resampling - change sample rate, - samplese per seconds

49.2. theory

waveform - wave or oscilates curve with amplitude
frequency - occurrences of vibrations per unit of time
sampling frequency or sampling rate - average number of samples obtained in one second. or hertz e.g. 48 kHz is 48,000 samples per second. 44.1kHz, or 44,100 samples per second
Bit depth - typically recorded at 8-, 16-, and 24-bit depth,
- mp3 does not have bit depth - compressed format
- wav - uncompressed
quality 44.1kHz / 16-bit - CD, 192kHz/24-bit - hires audio
bit rate - bits per second required for encoding without compression

Calc bit rate and size:

44.1kHz/16-bit: 44,100 x 16 x 2 = 1,411,200 bits per second (1.4Mbps)
44.1kHz/16-bit: 1.4Mbps * 300s = 420Mb (52.5MB)

All wave forms

periodic
- simple
- comples
aperiodic
- noise
- pulse

amplitude - distance from max and min
wavelength - total distance covered by a particle in one time period
Phase - location of the wave from an equilibrium point as time t=0

features

loudness - brain intensity
pitch - brain frequency
quality or Timbre - brain ?
intensity
amplitude phase
angular velocity

49.3. The Fourier Transform (spectrum)

mathematical formula - converts the signal from the time domain into the frequency domain.

result - *spectrum
Fourier’s theorem - signal can be decomposed into a set of sine and cosine waves
fast Fourier transform (FFT) is an algorithm that can efficiently compute the Fourier transform
Short-time Fourier transform - signal in the time-frequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows. for non periodic signals - such as music and speech

49.4. log-Mel spectrogram

spectrogram - the horizontal axis represents time, the vertical axis represents frequency, and the color intensity represents the amplitude of a frequency at a certain point.

y - Decibels
used to train convolutional neural networks for the classification

Mel-spectrogram converts the frequencies to the mel-scale is “a perceptual scale of pitches judged by listeners to be equal in distance from one another”

y - just Hz 0,64,128,256,512,1024
It uses the Mel Scale instead of Frequency on the y-axis.
It uses the Decibel Scale instead of Amplitude to indicate colors.
x - time sequence
value - mel shaped dB

Mel scale (after the word melody) - frequency(Hz) to mels(mel) conversion by formula

the pair at 100Hz and 200Hz will sound further apart than the pair at 1000Hz and 1100Hz.
you will hardly be able to distinguish between the pair at 10000Hz and 10100Hz.

Decibel Scale - *2

10 dB is 10 times louder than 0 dB
20 dB is 100 times louder than 10 dB

steps:

Separate to windows: Sample the input with windows of size n_fft=2048, making hops of size hop_length=512 each time to sample the next window.
Compute FFT (Fast Fourier Transform) for each window to transform from time domain to frequency domain.
Generate a Mel scale: Take the entire frequency spectrum, and separate it into n_mels=128 evenly spaced frequencies.
Generate Spectrogram: For each window, decompose the magnitude of the signal into its components, corresponding to the frequencies in the mel scale.

49.4.1. Log - because

np.log10(S) after mel spectrogram
or because Mel Scale has log in formule

 func frequencyToMel(_ frequency: Float) -> Float {
        return 2595 * log10(1 + (frequency / 700))
    }


    func melToFrequency(_ mel: Float) -> Float {
        return 700 * (pow(10, mel / 2595) - 1)
    }

49.5. pyo

http://ajaxsoundstudio.com/software/pyo/

libsndfile-dev

49.6. torchaudio

49.7. ffmpeg-python

doc https://kkroening.github.io/ffmpeg-python/

50. Whisper

a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model
Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder
automatic speech recognition (ASR)
Whisper is pre-trained on a vast quantity of labelled audio-transcription data, 680,000 hours to be precise
117,000 hours of this pre-training data is multilingual ASR data
supervised task of speech recognition
uses
- GPT2TokenizerFast https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/tokenization_gpt2_fast.py
  - byte-level Byte-Pair-Encoding
  - "gpt2" and "multilingual"

logits - all 51865 tokes probability

Steps:

model.transcribe
model.decode
DecodingTask.run()
self._main_loop

50.1. Byte-Pair Encoding (BPE)

Tokenization algorithms can be

word
subword - used by most state-of-the-art NLP models - frequently used words should not be split into smaller subwords
character-based

Subword-based tokenization:

splits the rare words into smaller meaningful subwords
WordPiece, Byte-Pair Encoding (BPE)(used in GPT-2), Unigram, and SentencePiece
https://huggingface.co/docs/transformers/tokenizer_summary
https://arxiv.org/abs/1508.07909

50.1.1. usage

from transformers import GPT2TokenizerFast
path = '/home/u2/.local/lib/python3.8/site-packages/whisper/assets/multilingual'

tokenizer = GPT2TokenizerFast.from_pretrained(path)

tokens = [[50364, 3450, 5505, 13, 50464, 51014, 9149, 11, 6035, 5345, 7520, 1595, 6885, 1725, 30162, 13, 51114, 51414, 21249, 7520, 9916, 13, 51464]]
print([tokenizer.decode(t).strip() for t in tokens])
print(tokenizer.encode('А вот. Да, но он уже у меня не работает. Нет уже нет.'))

50.2. model.transcribe(filepath or numpy)

mel = log_mel_spectrogram(audio) # split audio by chunks (84)
- whisper.audio.load_audio(filepath)
if no language set - it will use 30 seconds to detect language first
loop seek<length
- get 3000 frames - 30 seconds
- decode segment - DecodingResult=DecodingTask(model, options).run(mel) decoding.py (701) see 50.3
- if no speech then skip
- split segment to consequtives
tokenize and segment
summarize

segments - think a chunk of speech based you obtain from the timestamps. Something like 10:00s -> 13.52s would be a segment

50.2.1. return

text - full text
segments
- seek
- start&end
- text - segment text
- 'tokens': []
- 'temperature': 0.0,
- 'avg_logprob': -0.7076873779296875, # if < -1 - too low probability, retranscribe with another temperature
- 'compression_ratio': 1.1604938271604939,
- 'no_speech_prob': 0.5063244700431824 - если больше 0.6, то не возвращаем сегмент
'language': 'ru'

{'text': 'long text', 'segments': [{'id': 0, 'seek': 0, 'start': 0.0, 'end': 2.64, 'text': ' А вот, не добрый день.', 'tokens': [], 'temperature': 0.0, 'avg_logprob': -0.7076873779296875, 'compression_ratio': 1.1604938271604939, 'no_speech_prob': 0.5063244700431824}, {'id': 1, 'seek': 0, 'start': 2.64, 'end': 4.64, 'text': ' Меня зовут Дмитрий, это Русснорбанг.', 'tokens': [], 'temperature': 0.0, 'avg_logprob': -0.7076873779296875, 'compression_ratio': 1.1604938271604939, 'no_speech_prob': 0.5063244700431824}, {'id': 2, 'seek': 0, 'start': 4.64, 'end': 8.040000000000001, 'text': ' Дайте, он разжонили по поводу Мехеэлы Романовича Гапуэк,', 'tokens': [], 'temperature': 0.0, 'avg_logprob': -0.7076873779296875, 'compression_ratio': 1.1604938271604939, 'no_speech_prob': 0.5063244700431824},

{'id': 62, 'seek': 13828, 'start': 150.28, 'end': 151.28, 'text': ' Если…', 'tokens': [], 'temperature': 0.0, 'avg_logprob': -0.3628227009492762, 'compression_ratio': 1.0274509803921568, 'no_speech_prob': 1.6432641132269055e-05}, {'id': 63, 'seek': 13828, 'start': 151.28, 'end': 154.28, 'text': ' Если как-то пежись, хорошо, накрыли.', 'tokens': [], 'temperature': 0.0, 'avg_logprob': -0.3628227009492762, 'compression_ratio': 1.0274509803921568, 'no_speech_prob': 1.6432641132269055e-05}, {'id': 64, 'seek': 15428, 'start': 154.28, 'end': 183.28, 'text': ' Ну, да, всего доброго, до сих пор.', 'tokens': [50364, 7571, 11, 8995, 11, 15520, 35620, 2350, 11, 5865, 776, 4165, 11948, 13, 51814], 'temperature': 0.0, 'avg_logprob': -0.9855107069015503, 'compression_ratio': 0.576271186440678, 'no_speech_prob': 6.223811215022579e-05}], 'language': 'ru'}

50.3. model.decode(mel, options)

options: language

DecodingTask(model, options).run(mel)

create GPT2TokenizerFast wrapped
audio_features <- mel
tokens, sum_logprobs, no_speech_probs <- audio_features
texts: List[str] = [tokenizer.decode(t).strip() for t in tokens]
- tokens = [ [50364, 3450, 5505, 13, 50464, 51014, 9149, 11, 6035, 5345, 7520, 1595, 6885, 1725, 30162, 13, 51114, 51414, 21249, 7520, 9916, 13, 51464] ]
<- fine tune

https://huggingface.co/blog/fine-tune-whisper https://colab.research.google.com/drive/1P4ClLkPmfsaKn2tBbRp0nVjGMRKR-EWz

50.4. no_speech_prob and avg_logprob

no_speech_prob - calc at the first toke only and at SOT logit
avg_logprob
- sum_logprobs - sum of:
  - current_logprobs - logprobs = F.log_softmax(logits.float(), dim=-1)

50.5. decode from whisper_word_level 844

decode_word_level 781

result, ts = decode.run() 711 - decoding.py 612
finalize 524 - decoding.py 271

self.ts

self.decoder.update_with_ts 700 (main_loop) - decoding.py 602

50.6. main_loop

receive

audio_features
tokens with 3 values

tokes: int +=1 complete: bool = False sum_logprobs: int

50.7. words timestemps https://github.com/jianfch/stable-ts

timestamp_logits - ts_logits - self.ts -

50.7.1. transcribe format

segments:

[{'id': 0, 'seek': 0, 'offset': 0.0, 'start': 1.0, 'end': 3.0, 'text': ' А вот, не добрый день.', 'tokens': [50414, 3450, 5505, 11, 1725, 35620, 4851, 13509, 13, 50514, 50514, 47311, 46376, 3401, 919, 1635, 50161, 11, 2691, 6325, 7071, 461, 1234, 481, 1552, 1416, 1906, 13, 50564, 50564, 3401, 10330, 11, 5345, 4203, 1820, 1784, 5435, 2801, 10499, 35749, 50150, 386, 2338, 6325, 1253, 11114, 3903, 386, 7247, 4219, 23412, 3605, 13, 50714, 50714, 3200, 585, 37408, 585, 11, 2143, 10655, 30162, 1006, 17724, 15028, 4558, 13, 50814, 50814, 2348, 1069, 755, 12886, 387, 29868, 11, 776, 31158, 50233, 19411, 23201, 860, 1283, 25190, 13, 51014, 51014, 9149, 11, 6035, 5345, 7520, 1595, 6885, 1725, 30162, 13, 51064, 51064, 3450, 5505, 5865, 10751, 29117, 21235, 13640, 11, 2143, 5345, 1595, 10655, 2801, 7247, 9223, 24665, 30162, 13, 51314, 51314, 6684, 1725, 13790, 13549, 10986, 11, 6035, 8995, 11, 6035, 4777, 1725, 485, 51414, 51414, 21249, 7520, 9916, 13, 51464, 51464, 4857, 37975, 11, 25969, 5878, 11, 3014, 50150, 386, 2338, 6325, 1253, 11114, 3903, 1595, 6519, 3348, 35968, 23412, 34005, 47573, 51664, 51664, 10969, 45309, 13388, 19465, 5332, 4396, 20392, 44356, 740, 1069, 755, 1234, 1814, 13254, 11, 51814, 51814], 'temperature': 0.0, 'avg_logprob': -0.5410955043438354, 'compression_ratio': 1.1496259351620948, 'no_speech_prob': 0.5069490671157837, 'alt_start_timestamps': [1.0, 0.9199999570846558, 1.0399999618530273, 0.9599999785423279, 1.100000023841858, 0.9399999976158142, 0.9799999594688416, 1.0799999237060547, 1.1200000047683716, 1.1999999284744263], 'start_ts_logits': [13.0390625, 12.4140625, 12.296875, 12.2109375, 12.171875, 12.140625, 12.0390625, 11.9921875, 11.9453125, 11.8046875], 'alt_end_timestamps': [3.0, 2.0, 2.859999895095825, 2.879999876022339, 2.8999998569488525, 4.0, 2.9800000190734863, 3.0399999618530273, 2.299999952316284, 2.359999895095825], 'end_ts_logits': [9.6015625, 8.9375, 7.65234375, 7.53125, 7.4609375, 7.4609375, 7.30859375, 7.28515625, 7.22265625, 7.11328125], 'unstable_word_timestamps': [{'word': ' А', 'token': 3450, 'timestamps':[7.0, 29.5, 1.0, 29.35999870300293, 13.0, 29.279998779296875, 29.34000015258789, 29.479999542236328, 28.939998626708984, 29.01999855041504], 'timestamp_logits': [15.1328125, 15.0703125, 14.9921875, 14.96875, 14.96875, 14.96875, 14.890625, 14.8359375, 14.7890625, 14.7890625]}, {'word': ' вот', 'token': 5505, 'timestamps': [27.34000015258789, 29.31999969482422, 26.979999542236328, 28.420000076293945, 28.739999771118164, 27.31999969482422, 28.439998626708984, 29.34000015258789, 13.519999504089355, 28.239999771118164], 'timestamp_logits': [19.546875, 19.46875, 19.296875, 19.125, 19.109375, 19.109375, 19.09375, 19.09375, 19.078125, 19.046875]}, {'word': ',', 'token': 11, 'timestamps': [2.0, 3.0, 4.0, 1.0, 1.7999999523162842, 10.0, 3.0199999809265137, 1.7599999904632568, 19.0, 3.5], 'timestamp_logits': [14.8828125, 13.640625, 13.21875, 12.734375, 11.3828125, 11.3671875, 11.3515625, 11.3359375, 11.2890625, 11.2578125]}, {'word': ' не', 'token': 1725, 'timestamps': [2.0, 1.0, 1.7599999904632568, 1.71999990940094, 1.6399999856948853, 1.7799999713897705, 28.19999885559082, 1.7999999523162842, 7.0, 28.239999771118164], 'timestamp_logits': [15.328125, 15.03125, 14.921875, 14.4453125, 14.3671875, 14.234375, 14.2265625, 14.203125, 14.0234375, 13.875]}, {'word': ' добр', 'token': 35620, 'timestamps': [28.099998474121094, 28.139999389648438, 14.75999927520752, 14.920000076293945, 27.099998474121094, 18.119998931884766, 14.59999942779541, 28.260000228881836, 13.0, 26.599998474121094], 'timestamp_logits': [14.015625, 13.9765625, 13.96875, 13.8515625, 13.84375, 13.8046875, 13.7109375, 13.7109375, 13.6953125, 13.6953125]}, {'word': 'ый', 'token': 4851, 'timestamps': [13.59999942779541, 15.399999618530273, 13.279999732971191, 14.719999313354492, 13.399999618530273, 14.880000114440918, 13.0, 14.59999942779541, 13.679999351501465, 13.639999389648438], 'timestamp_logits': [15.4140625, 15.28125, 15.21875, 14.765625, 14.7265625, 14.71875, 14.6328125, 14.578125, 14.5546875, 14.53125]}, {'word': ' день', 'token': 13509, 'timestamps': [2.0, 20.959999084472656, 3.0, 25.68000030517578, 3.4800000190734863, 24.0, 3.5, 19.920000076293945, 28.559999465942383, 4.0], 'timestamp_logits': [9.3984375, 9.21875, 9.046875, 9.015625, 8.9296875, 8.90625, 8.875, 8.8203125, 8.7890625, 8.7421875]}, {'word': '.', 'token': 13, 'timestamps': [3.0, 2.0, 4.0, 3.5, 3.0199999809265137, 2.879999876022339, 3.319999933242798, 3.0399999618530273, 2.299999952316284, 2.859999895095825], 'timestamp_logits': [12.6328125, 12.4296875, 10.875, 10.2578125, 9.828125, 9.5078125, 9.4921875, 9.421875, 9.3828125, 9.3046875]} ], 'anchor_point': False, 'word_timestamps': [{'word': ' А', 'token': 3450, 'timestamp': 1.0}, {'word': ' вот', 'token': 5505, 'timestamp': 1.0}, {'word': ',', 'token': 11, 'timestamp': 2.0}, {'word': ' не', 'token': 1725, 'timestamp': 2.0}, {'word': ' добр', 'token': 35620, 'timestamp': 2.0}, {'word': 'ый', 'token': 4851, 'timestamp': 2.0}, {'word': ' день', 'token': 13509, 'timestamp': 2.0}, {'word': '.', 'token': 13, 'timestamp': 3.0}], 'whole_word_timestamps': [{'word': ' А', 'timestamp': 1.3799999952316284}, {'word': ' вот,', 'timestamp': 1.7599999904632568}, {'word': ' не', 'timestamp': 1.7899999618530273}, {'word': ' добр', 'timestamp': 1.8899999856948853}, {'word': 'ый', 'timestamp': 1.8899999856948853}, {'word': ' день.', 'timestamp': 2.5899999141693115} ] }, {'id': 1,

50.8. confidence score

sum_logprobs: List[float] = [lp[i] for i, lp in zip(selected, sum_logprobs)]

avg_logprob - [lp / (len(t) + 1) for t, lp in zip(tokens, sum_logprobs)]

path

model.transcribe
model.decode
transcribe_word_level (whisper_word_level.py:39)
results, ts_tokens, ts_logits_ = model.decode

50.9. TODO main/notebooks

50.10. links

https://github.com/openai/whisper

https://cdn.openai.com/papers/whisper.pdf

51. NER USΕ CASES

51.1. Spelling correction algorithms or (spell checker) or (comparing a word to a list of words)

https://www.quora.com/Algorithms-What-is-a-fast-way-of-comparing-a-word-to-a-list-of-words-to-find-the-closest-match

Damerau-Levenshtein - edit distance with constant time O(1) - independent of the word list size (but depending on the average term length and maximum edit distance)

51.2. fuzzy string comparision или Приближённый поиск

https://stackoverflow.com/questions/6690739/high-performance-fuzzy-string-comparison-in-python-use-levenshtein-or-difflib

approaches:

Levenshtein is O(m*n) - mn - length of the two input strings
difflib.SequenceMatcher
- uses the Ratcliff/Obershelp algorithm - O(n*2)
расстояние Хэмминга - не учитывает удаление символов, а считает только для двух строк одинаковой длины количество символов

databases

Clickhouse https://habr.com/en/company/yandex/blog/466183/

52. Flax and Jax

Google

Flax - neural network library and ecosystem for JAX designed for flexibility

53. hyperparemeter optimization library test-tube

https://github.com/williamFalcon/test-tube

54. Keras

MIT нейросетевая библиотека

надстройку над фреймворками Deeplearning4j, TensorFlow и Theano
Нацелена на оперативную работу с сетями глубинного обучения
компактной, модульной и расширяемой
высокоуровневый, более интуитивный набор абстракций, который делает простым формирование нейронных сетей,
channels_last - default for keras python-ds#MissingReference

import logging logging.getLogger('tensorflow').disabled = True

loss - loss function https://github.com/keras-team/keras/blob/c2e36f369b411ad1d0a40ac096fe35f73b9dffd3/keras/metrics.py
- mean_squared_error
- categorical_crossentropy
- binary_crossentropy
- sparse_categorical_accuracy - Calculates the top-k categorical accuracy rate, i.e. success when the target class is within the top-k predictions provided.
- top_k_categorical_accuracy - Calculates the top-k categorical accuracy rate, i.e. success when the target class is within the top-k predictions provided.
- sparse_top_k_categorical_accuracy

Steps:

# 1.declare keras.layers.Input and keras.layers.Dense in chain
# 2.
model = Model(inputs=inputs, outputs=predictions) # where inputs - inputs, predictions - last Dense layout
# 3. Configures the model for training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) #
# 4.
model.fit(data, labels, epochs=10, batch_size=32)
# 5.
model.predict(np.array([[3,3,3]])) - shape (3,)

model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

54.1. install

pip install keras –user

54.2. API types

Model subclassing: from keras.models import Model
Model constructor - deprecated
Functional API
Sequential model

54.3. Sequential model

first layer needs to receive information about its input shape - following layers can do automatic shape

inference

54.4. functional API

54.5. Layers

layer.get_weights()
layer.get_config(): returns a dictionary containing the configuration of the layer.

54.5.1. types

Input - instantiate a Keras tensor Input(shape=(784,)) - indicates that the expected input will be batches of 784-dimensional vectors
Dense - Each neuron recieves input from all the neurons in the previous layer
Embedding - can only be used as the first layer
Merge Layers - concatenate - Add - Substract - Multiply - Average etc.

54.5.2. Dense

output = activation(dot(input, kernel) + bias)

54.6. Models

attributes:

model.layers is a flattened list of the layers comprising the model.
model.inputs is the list of input tensors of the model.
model.outputs is the list of output tensors of the model.
model.summary() prints a summary representation of your model. Shortcut for
model.get_config() returns a dictionary containing the configuration of the model.

54.7. Accuracy:

# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.99794011611938471

# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98999999999999999

54.8. input shape & text prepare

import numpy as np
data = np.random.random((2, 3)) # ndarray [[1,1,1],[1,1,1]]
print(data.shape) # = (2,3)

(2,)

data = np.random.random((2,)) # [0.3907832  0.00941261]

list to ndarray

np.array(texts)
np.asarray(texts)

fit of batches

model.fit([np.asarray([x_embed , x_embed]) , np.asarray([x2_onehot, x2_onehot])], np.asarray([y_onehot[0], y_onehot[0]]), epochs=2, batch_size=2)

54.9. ValueError: Error when checking input: expected input_1 to have 3 dimensions, but got array with shape

Input(shape=(5,100))

then

model.fit(x_embed, y_onehot, epochs=3, batch_size=1)

where x_embed.shape = (1, 5, 100)

54.10. merge inputs

https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/

Добавил ещё один Input(shape=(x2_size,) в виде вектора и сделал concatenate на плоском слое нейронов, важно чтобы shape были одной размерности в данном случае это вектор

    inp = Input(shape=(words, embedding_size))  # 5 tokens
    output = inp
    #my
    #word_input = Input(shape=(x2_size,), name='word_input')

    outputs = []
    for i in range(len(kernel_sizes_cnn)):
        output_i = Conv1D(filters_cnn, kernel_size=kernel_sizes_cnn[i],
                          activation=None,
                          kernel_regularizer=l2(coef_reg_cnn),
                          padding='same')(output)
        output_i = BatchNormalization()(output_i)
        output_i = Activation('relu')(output_i)
        output_i = GlobalMaxPooling1D()(output_i)
        outputs.append(output_i)

    output = concatenate(outputs, axis=1)
    #my
    output = concatenate([output, word_input]) #second input

    output = Dropout(rate=dropout_rate)(output)
    output = Dense(dense_size, activation=None,
                   kernel_regularizer=l2(coef_reg_den))(output)

    output = BatchNormalization()(output)
    output = Activation('relu')(output)
    output = Dropout(rate=dropout_rate)(output)
    output = Dense(n_classes, activation=None,
                   kernel_regularizer=l2(coef_reg_den))(output)
    output = BatchNormalization()(output)
    act_output = Activation("softmax")(output)
    model = Model(inputs=[inp, word_input], outputs=act_output)

model: Model = build_model(vocab_y.len, embedder.dim, words, embedder.dim)
model.fit([np.asarray(x), np.asarray(x2)], np.asarray(y), epochs=100, batch_size=2)

54.11. convolution

filters - dimensionality of the output space - In practice, they are in number of 64,128,256, 512 etc.
kernel_size is size of these convolution filters - sliding window. In practice they are 3x3, 1x1 or 5x5
Note that number of filters from previous layer become the number of channels for current layer's input image.

54.12. character CNN

https://towardsdatascience.com/besides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10

54.13. Early stopping

https://keras.io/callbacks/

from tensorflow.keras.callbacks import EarlyStopping
early_stopping_callback = EarlyStopping(monitor='val_acc', patience=2)
model.fit(X_train, Y_train, callbacks=[early_stopping_callback])

from keras.callbacks import EarlyStopping
# ...
num_epochs = 50 # we iterate at most fifty times over the entire training set
# ...
# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1,
                        callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping

54.14. plot history

history = model.fit(X_train, Y_train, validation_split=0.2)
plt.plot(history.history['acc'],
         label='Доля верных ответов на обучающем наборе')
plt.plot(history.history['val_acc'],
         label='Доля верных ответов на проверочном наборе')
plt.xlabel('Эпоха обучения')
plt.ylabel('Доля верных ответов')
plt.legend()
plt.show()

54.15. ImageDataGenerator class

https://medium.com/@vijayabhaskar96/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720
flow() - Takes (x,y), return generator for model.fit_generator()
flow_from_directory() - берез директорию с субдиректориями и выдает (x,y) без остановки или в одну директорию
flow_from_dataframe()
fit() - Only required if `featurewise_center` or `featurewise_std_normalization` or `zca_whitening` are set to True.

datagen = ImageDataGenerator(
#         zoom_range=0.2, # randomly zoom into images
#         rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

54.16. CNN Rotate

54.17. LSTM

https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ By default the Keras implementation resets the network state after each training batch.

model.add(LSTM(50, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.reset_states() # at the end of epoch

55. Tesseract - Optical Character Recognition

55.1. compilation

https://github.com/tesseract-ocr/tesseract/wiki/Compiling

dockerfile:

RUN apt-get update && apt-get install -y --no-install-recommends \
  g++ \
  automake \
  make \
  libtool \
  pkg-config \
  libleptonica-dev \
  curl \
  libpng-dev \
  zlib1g-dev \
  libjpeg-dev \
  && apt-get autoclean \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

ARG PREFIX=/usr/local
ARG VERSION=4.1.0

RUN curl --silent --location --location-trusted \
        --remote-name https://github.com/tesseract-ocr/tesseract/archive/$VERSION.tar.gz \
  && tar -xzf $VERSION.tar.gz \
  && cd tesseract-$VERSION \
  && ./autogen.sh \
  && ./configure --prefix=$PREFIX \
  && make \
  && make install \
  && ldconfig

55.2. black and white list

https://github.com/tesseract-ocr/langdata/blob/master/rus/rus.training_text

./tesseract -l eng /home/u2/Documents/2.jpg stdout -c tessedit_char_blacklist='0123456789'
./tesseract -l eng /home/u2/Documents/2.jpg stdout -c tessedit_char_whitelist='0123456789'

print(pytesseract.image_to_string(im, lang='rus', config='-c tessedit_char_whitelist=0123456789'))

55.3. notes

when we repeat symbol it start to recognize it

55.4. prepare

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
20-30 pix - height character

55.5. usage

text = pytesseract.image_to_string(img, lang='rus')

letters = pytesseract.image_to_boxes(img, lang='rus')
letters = letters.split('\n')
letters = [letter.split() for letter in letters]
h, w = img.shape
for letter in letters:
     cv.rectangle(img, (int(letter[1]), h - int(letter[2])), (int(letter[3]), h - int(letter[4])), (0, 0, 255), 2)
            p_x = int(letter[1])
            p_y = hh - int(letter[2])  # 0 at top - LOWER
            p_x2 = int(letter[3])
            p_y2 = hh - int(letter[4])  # 0 at top - close to 0 - higher y2 < y

            # cv.rectangle(img, (int(letter[1]), h - int(letter[2])), (int(letter[3]), h - int(letter[4])), (0, 0, 255),
            #              2)

            cc = [
                [p_x, p_y],
                [p_x2, p_y],  # _
                [p_x2, p_y2],  # _|
                [p_x, p_y2]]

            c = np.array(cc, dtype=np.int32)

            # print(cv.contourArea(c), ',')

            # print(cc)
            # cv.drawMarker(img, (int(letter[1]), hh - int(letter[2])), -1, (0, 255, 0), 3)

            x = p_x
            y = p_y2
            w = p_x2 - p_x
            h = p_y - p_y2
            box = [x, y, w, h]

56. FEATURE ENGEERING

56.1. Featuretools - Aturomatic Feature Engeering

Limitation: intended to be run on datasets that can fit in memory on one machine

делить закачку по строкам и делать массив
закачивать часть по дате

Steps:

create dict {column:[rows], column2:[rows]}
EntitySet
- Entities pd.DataFrame
- Relations
  - one to one only - for many to many you must create middle set(ids)
  - for each child id parent id MUST EXIST
  - child id and parent id type must be queal
ft.dfs - Input - entities with relationships

Cons

мусорные столбцы построенные на id столбцах и в порядке от child к parent при many-to-many

for prediction you must have в 10 раз больше строк чем feature https://www.youtube.com/watch?v=Dc0sr0kdBVI&hd=1#t=57m20s

56.1.1. variable types

https://docs.featuretools.com/en/stable/api_reference.html#variable-types
указывается при созданни Entity
foreign key

56.1.2. example one-to-many

# sys.partner_id - foreign key
# partner - one
# sys - many
entities = {
  "sys": (sys, "id"),
  "partner": (partner, "id)
}
relationships = {
  ("partner", "id", "sys", "partner_id")
}
# fields:
# partner.SUM(sys.field1)

56.1.3. example many-to-many

entities = {
  "sys": (sys, "id"),
  "cl_ids": (cl_ids, "id"),
  "cl_budget": (cl_budget, "idp")
}
relationships = {
  ("cl_ids", "id", "sys", "client_id"),
  ("cl_ids", "id", "cl_budget", "id")
}

# cl_ids.SUM(cl_budget.field1)
# cl_ids.SUM(sys.field1) - мусорное поле, дублирующиее sys.field1

56.1.4. oparations

https://primitives.featurelabs.com/

ft.list_primitives().head(5)

56.1.5. aggregation primitive - across a parent-child relationship:

Default: [“sum”, “std”, “max”, “skew”, “min”, “mean”, “count”, “percent_true”, “num_unique”, “mode”]

skew: Computes the extent to which a distribution differs from a normal distribution.
std: Computes the dispersion relative to the mean value, ignoring `NaN`.
percent_true: Determines the percent of `True` values.
mode: Determines the most commonly repeated value.

all

0 std aggregation 1 median aggregation 2 n_most_common aggregation 3 num_true aggregation 4 time_since_last aggregation 5 max aggregation 6 entropy aggregation 7 any aggregation 8 mode aggregation 9 time_since_first aggregation 10 trend aggregation 11 first aggregation 12 sum aggregation 13 count aggregation 14 skew aggregation 15 avg_time_between aggregation 16 percent_true aggregation 17 num_unique aggregation 18 all aggregation 19 min aggregation 20 last aggregation 21 mean aggregation Computes the dispersion relative to the mean value, ignoring `NaN`. Determines the middlemost number in a list of values. Determines the `n` most common elements. Counts the number of `True` values. Calculates the time elapsed since the last datetime (default in seconds). Calculates the highest value, ignoring `NaN` values. Calculates the entropy for a categorical variable Determines if any value is 'True' in a list. Determines the most commonly repeated value. Calculates the time elapsed since the first datetime (in seconds). Calculates the trend of a variable over time. Determines the first value in a list. Calculates the total addition, ignoring `NaN`. Determines the total number of values, excluding `NaN`. Computes the extent to which a distribution differs from a normal distribution. Computes the average number of seconds between consecutive events. Determines the percent of `True` values. Determines the number of distinct values, ignoring `NaN` values. Calculates if all values are 'True' in a list. Calculates the smallest value, ignoring `NaN` values. Determines the last value in a list. Computes the average for a list of values.

56.1.6. TransformPrimitive - one or more variables from an entity to one new:

Default: [“day”, “year”, “month”, “weekday”, “haversine”, “num_words”, “num_characters”]

Useful:

divide_numeric - ratio

Transform Don't have:

root
square_root
log

all
- https://docs.featuretools.com/en/stable/_modules/featuretools/primitives/standard/binary_transform.html
- 22 year transform Determines the year value of a datetime.
- 23 equal transform Determines if values in one list are equal to another list.
- 24 isin transform Determines whether a value is present in a provided list.
- 25 num_characters transform Calculates the number of characters in a string.
- 26 less_than_scalar transform Determines if values are less than a given scalar.
- 27 less_than_equal_to transform Determines if values in one list are less than or equal to another list.
- 28 multiply_boolean transform Element-wise multiplication of two lists of boolean values.
- 29 week transform Determines the week of the year from a datetime.
- 30 greater_than_equal_to_scalar transform Determines if values are greater than or equal to a given scalar.
- 31 and transform Element-wise logical AND of two lists.
- 32 multiply_numeric transform Element-wise multiplication of two lists.
- 33 second transform Determines the seconds value of a datetime.
- 34 not_equal transform Determines if values in one list are not equal to another list.
- 35 day transform Determines the day of the month from a datetime.
- 36 cum_min transform Calculates the cumulative minimum.
- 37 greater_than_scalar transform Determines if values are greater than a given scalar.
- 38 modulo_numeric_scalar transform Return the modulo of each element in the list by a scalar.
- 39 subtract_numeric_scalar transform Subtract a scalar from each element in the list.
- 40 absolute transform Computes the absolute value of a number.
- 41 add_numeric_scalar transform Add a scalar to each value in the list.
- 42 cum_count transform Calculates the cumulative count.
- 43 divide_by_feature transform Divide a scalar by each value in the list.
- 44 divide_numeric_scalar transform Divide each element in the list by a scalar.
- 45 time_since_previous transform Compute the time since the previous entry in a list.
- 46 longitude transform Returns the second tuple value in a list of LatLong tuples.
- 47 cum_max transform Calculates the cumulative maximum.
- 48 not transform Negates a boolean value.
- 49 not_equal_scalar transform Determines if values in a list are not equal to a given scalar.
- 50 diff transform Compute the difference between the value in a list and the
- 51 equal_scalar transform Determines if values in a list are equal to a given scalar.
- 52 num_words transform Determines the number of words in a string by counting the spaces.
- 53 divide_numeric transform Element-wise division of two lists.
- 54 less_than_equal_to_scalar transform Determines if values are less than or equal to a given scalar.
- 55 month transform Determines the month value of a datetime.
- 56 or transform Element-wise logical OR of two lists.
- 57 weekday transform Determines the day of the week from a datetime.
- 58 less_than transform Determines if values in one list are less than another list.
- 59 minute transform Determines the minutes value of a datetime.
- 60 multiply_numeric_scalar transform Multiply each element in the list by a scalar.
- 61 greater_than_equal_to transform Determines if values in one list are greater than or equal to another list.
- 62 hour transform Determines the hour value of a datetime.
- 63 modulo_by_feature transform Return the modulo of a scalar by each element in the list.
- 64 scalar_subtract_numeric_feature transform Subtract each value in the list from a given scalar.
- 65 is_weekend transform Determines if a date falls on a weekend.
- 66 greater_than transform Determines if values in one list are greater than another list.
- 67 cum_mean transform Calculates the cumulative mean.
- 68 modulo_numeric transform Element-wise modulo of two lists.
- 69 subtract_numeric transform Element-wise subtraction of two lists.
- 70 haversine transform Calculates the approximate haversine distance between two LatLong
- 71 is_null transform Determines if a value is null.
- 72 add_numeric transform Element-wise addition of two lists.
- 73 cum_sum transform Calculates the cumulative sum.
- 74 percentile transform Determines the percentile rank for each value in a list.
- 75 time_since transform Calculates time from a value to a specified cutoff datetime.
- 76 latitude transform Returns the first tuple value in a list of LatLong tuples.
- 77 negate transform Negates a numeric value.

56.1.7. create primitive

https://medium.com/@rrfd/simple-automatic-feature-engineering-using-featuretools-in-python-for-classification-b1308040e183

from featuretools.primitives import make_trans_primitive
from featuretools.variable_types import Numeric
# Create two new functions for our two new primitives
def Log(column):
    return np.log(column)
def Square_Root(column):
    return np.sqrt(column)
# Create the primitives
log_prim = make_trans_primitive(
    function=Log, input_types=[Numeric], return_type=Numeric)
square_root_prim = make_trans_primitive(
    function=Square_Root, input_types=[Numeric], return_type=Numeric)

56.1.8. EXAMPLE from pandas

es = ft.EntitySet()
matches_df = pd.read_csv("./matches.csv")
es.entity_from_dataframe(entity_id="matches",
                         index="match_id",
                         time_index="match_date",
                         dataframe=matches_df)

56.2. TODO informationsfabirc

57. support libraries

dask: scale numpy, pandas, scikit-learn, XGBoost
(no term): tqdm - progress meter for loops: for i in tqdm(range(1000)):
(no term): msgpack - binary serialization of JSON for example
(no term): cloudpickle - serialize to "pickle" lambda and classes
(no term): tornado - non-blocking network I/O
(no term): BeautifulSoup - extract data for web html pages

58. Microsoft nni AutoML framework (stupid shut)

59. transformers - provides pretrained models

pip3 install transformers==4.24.0 --user
/usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
  from cryptography.utils import int_from_bytes
/usr/lib/python3/dist-packages/secretstorage/util.py:19: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
  from cryptography.utils import int_from_bytes
Collecting transformers==4.24.0
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 349.8 kB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (4.48.2)
Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (22.0)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (0.12.1)
Requirement already satisfied: requests in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (2.28.1)
Requirement already satisfied: numpy>=1.17 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (1.24.0)
Requirement already satisfied: filelock in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (3.0.12)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (0.10.0)
Requirement already satisfied: regex!=2019.12.17 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (2022.9.13)
Requirement already satisfied: pyyaml>=5.1 in ./.local/lib/python3.8/site-packages (from transformers==4.24.0) (5.4.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.local/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.10.0->transformers==4.24.0) (4.4.0)
Requirement already satisfied: idna<4,>=2.5 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.24.0) (3.4)
Requirement already satisfied: charset-normalizer<3,>=2 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.24.0) (2.1.1)
Requirement already satisfied: certifi>=2017.4.17 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.24.0) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.24.0) (1.26.13)
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.22.2
    Uninstalling transformers-4.22.2:
      Successfully uninstalled transformers-4.22.2
Successfully installed transformers-4.24.0

60. help

60.1. build-in help

help(L.append) - docstr and many things
dir() or dir(object) - list of all the globals and locals.
locals() variables, and their values (called inside method)
globals() method returns all the global variables, and their values, in a dictionary

61. IDE

By default, Python source files are treated as encoded in UTF-8 to change it:

#!/usr/bin/env python3
# - '*' - coding: cp1252 -*-

https://en.wikipedia.org/wiki/Comparison_of_integrated_development_environments#Python

61.1. EPL

py.exe or python.exe file [arg]

Exit - Control-D on Unix, Control-Z on Windows. - quit();
blank line; this is used to end a multi-line command.

61.2. PyDev is a Python IDE for Eclipse

Cltr+Space
F3 go to definition Alt+Arrow < >
Shift+Enter - next line
Ctrl+1 assign paramenters to field, create class constructor
Ctrl+2/R - rename varible
Alt+Shift+R rename verible
Alt+Shift+A Start/Stop Rectangular editing
Ctrl+F9 run test
Ctrl+F11 rerun last launch
Ctrl+Alt+Down/Up duplicate line
Alt+Shift+L Extract local varible
Alt+Shift+R Extract method

Firest

Create Project
Create new Source Folder - "src" http://www.pydev.org/manual_101_project_conf2.html

61.2.1. features

Django integration
Code completion
Code completion with auto import
Type hinting
Code analysis
Go to definition
Refactoring
Debugger
Remote debugger
Find Referrers in Debugger
Tokens browser
Interactive console
Unittest integration
Code coverage
PyLint integration
Find References (Ctrl+Shift+G)

61.3. Emacs

M-~ menu

61.3.1. python in org mode

https://stackoverflow.com/questions/18598870/emacs-org-mode-executing-simple-python-code

C-c C-c - to activate

1+1

print(1+1)

.emacs configuration:

;; enable python for in-buffer evaluation
(org-babel-do-load-languages
 'org-babel-load-languages
 '((python . t)))

;; all python code be safe
(defun my-org-confirm-babel-evaluate (lang body)
(not (string= lang "python")))
(setq org-confirm-babel-evaluate 'my-org-confirm-babel-evaluate)

;; required
(setq shell-command-switch "-ic")

61.3.2. Emacs

https://habr.com/ru/post/303600/

https://crafting.be/2015/06/emacs-python-django-dev/

.emacs.d/lisp

Company is a text completion framework for Emacs http://company-mode.github.io/
Jedi Python auto-completion package http://tkf.github.io/emacs-jedi/latest/
Elpy Emacs Python Development Environment https://github.com/jorgenschaefer/elpy

61.4. PyCharm

61.4.1. installation:

Other settings -> settings for new project -> Tools -> Python integrated tools -> docstrings - reStructuredText
Ctrl+Alt+S -> keymap - Emacs

navigate

Ctrl+Alt+S -> keymap - up -> Ctrl+k
Ctrl+Alt+S -> keymap - left -> Ctrl+l
Ctrl+Alt+S -> keymap - move catet to previous word -> Alt+l

other:

Ctrl+Alt+S -> keymap - Error Description -> add key Alt+Z
Ctrl+Alt+S -> keymap - Navigate; Back -> add key Ctrl+\
Ctrl+Alt+S -> keymap - Select next tab -> Alt+E
Ctrl+Alt+S -> keymap - Select previous tab -> Alt+A
Ctrl+Alt+S -> keymap - Close tab -> Ctrl+Alt+w
Ctrl+Alt+S -> keymap - Backspace -> Ctrl+h
Ctrl+Alt+S -> keymap - Delete to word start -> Alt+h
Ctrl+Alt+S -> keymap - run/ -> Ctrl+C Ctrl+C
Ctrl+Alt+S -> keymap - back (Navigate) -> Alt+,

Disable cursor blinking: Ctrl+Alt+s -> Editor, General, Appearance

61.4.2. keys

Alt+\ - main menu
Alt+Shift+F10 - run
Alt+Shift+F8 - debug
Ctrl+Shift+U to upper case
Ctrl+. fold/unfold
Ctrl+q get documentation
Ctrl+Alt+q auto-indent lines
Ctrl+z/v scroll
Alt+left/right switch tabs
Ctrl+x k close tab
Ctrl+x ` go to next error
Alt+. go to declaration
Ctr+Shift+' maximize bottom console

emacs keymap

Alt+Shift+F10 run
Alt+; - comment text
leftAlt+ arrows - tabs switch
leftAlt+Enter - at yello - variants to solve
Ctrl+Alt+L - Reformat code
Alt+Enter - at error - fix error menu
F10 - menu
Esc+Esc - focus Editor
F12 - focus last tool window(run)
Shift+Esc - hide low "Run"
Ctrl+ +/- - unfold/fold
Ctrl+m - Enter

navigate (Goto by reference actions)

Ctrl+Alt+g, Alt+. - navigate to definition
Alt+, - Navigate; Back (my)

Windows

Alt+1 - project navigation
Alt+2 - bookmars and debug points
Alt+4 - console
Alt+5 - debug
F11 - create
Ctrl-Shift+F8 - debug points
Shift-F11 bookmars
shift+Esc - hide current window
switch to main window - shift+Esc or F4 or Alt+current window or double Alt+any
C-x k - close current tab

not emacs

Ctrl+/ - comment text
Ctrl+b - navigate to definition

61.5. ipython

Ctrl+e Ctrl+o - multiline code or if 1:
Ctrl+r - search in history

61.6. geany

no autocompletion

61.7. BlueFish

Style - preferences->Editor settings->Fonts&Colours->Use system wide color settings

S-C-c comment
C-space completion

to execute file:

preferences->external commands->
- any name: xfce4-terminal -e 'bash -c "python %f; exec bash"'

cons

cannot execute

61.8. Eric

echo "dev-python/PyQt5 network" >> /etc/portage/package.use/eric
emerge mercurial PyQt5 qscintilla-python dev-qt/qtcharts dev-qt/qtwebengine
cd /usr/local
hg clone https://hg.die-offenbachs.homelinux.org/eric
or https://sourceforge.net/projects/eric-ide/files/latest/download
select branch
- hg up eric7-maintenance (PyQt6)
- hg up eric6 (PyQt5)

61.9. Google Colab

61.9.1. TODO todo

Overfitting Look-ahead Bias P-hacking

https://course.algotrading101.com/courses/pt101-practical-python-for-finance-trading-masterclass/lectures/27360454

https://colab.research.google.com/signup/pricing

61.9.2. initial config

Runtime -> View resources -> Change runtime tupe - GPU
Editor -> Code diagnostics -> Syntax and type checking
Miscellaneous -> Power level - ?

61.9.3. keys (checked):

Ctrl-a/e Move cursor to the beginning/end of the line
Ctrl-Alt-n/p Move cursor to the beginning of the line
Ctrl-d/h Delete next/previous character in line
Ctrl-k Delete text from cursor to end of line
Ctrl-space auto completion
Ctrl+o new line and stay at current
Ctrl+j delete and of the line character and set cursor at the end
Ctrl+m m/y convert (code to text)/(text to code)
Ctrl+z/y undo/redo action

Docstring:

Ctrl + mouse over variable
Ctrl + space + mouse click

keys advanced (checked)

Ctrl+s save notebook
Ctrl+m activate the shortcuts
Ctrl+m h get Keyboard preferences
Tab Toggle code docstring help
Shift+Tab Unindent current line
Ctrl+m n/p next/preview cell (like arrows)
Ctrl+] Collapse
Ctrl+' toggle collapse
Ctrl+Shift+Enter Run
Ctrl+Shift+S select focused cell
Ctrl+m o show hide output
Ctrl+m a/b add cell above/below
ctrl+m+d Delete cell
Ctrl+shift+alt+p command palette

61.9.4. keys in Internet (emacs IPython console)

Ctrl-C and Ctrl-V) for copying and pasting in a wide variety of programs and systems

Ctrl-a Move cursor to the beginning of the line
Ctrl-e Move cursor to the end of the line
Ctrl-b or the left arrow key Move cursor back one character
Ctrl-f or the right arrow key Move cursor forward one character
Backspace key Delete previous character in line
Ctrl-d Delete next character in line
Ctrl-k Cut text from cursor to end of line
Ctrl-u Cut text from beginning of line to cursor
Ctrl-y Yank (i.e. paste) text that was previously cut
Ctrl-t Transpose (i.e., switch) previous two characters
Ctrl-p (or the up arrow key) Access previous command in history
Ctrl-n (or the down arrow key) Access next command in history
Ctrl-r Reverse-search through command history

Ctrl-l Clear terminal screen
Ctrl-c Interrupt current Python command
Ctrl-d Exit IPython session

61.9.5. Google Colab Magics

set of system commands that can be seen as a mini extensive command language

line magics start with %, while the cell magics start with %%
%lsmagic - full list of available magics
%ldir
%%html

more https://colab.research.google.com/notebooks/intro.ipynb

61.9.6. install libraries and system commands

!pip install or !apt-get install
- !apt-get -qq install -y libfluidsynth1
!wget
!git clone https://github.com/wxs/keras-mnist-tutorial.gi
!ls /bin

61.9.7. execute code from google drive

# Run this cell to mount your Google Drive.
from google.colab import drive
drive.mount('/content/drive')

!python3 "/content/drive/My Drive/Colab Notebooks/hello.py"

61.9.8. shell

from IPython.display import JSON
from google.colab import output
from subprocess import getoutput
import os

def shell(command):
  if command.startswith('cd'):
      path = command.strip().split(maxsplit=1)[1]
      os.chdir(path)
      return JSON([''])
  return JSON([getoutput(command)])
output.register_callback('shell', shell)

#@title Colab Shell
%%html
<div id=term_demo></div>
<script src="https://code.jquery.com/jquery-latest.js"></script>
<script src="https://cdn.jsdelivr.net/npm/jquery.terminal/js/jquery.terminal.min.js"></script>
<link href="https://cdn.jsdelivr.net/npm/jquery.terminal/css/jquery.terminal.min.css" rel="stylesheet"/>
<script>
  $('#term_demo').terminal(async function(command) {
      if (command !== '') {
          try {
              let res = await google.colab.kernel.invokeFunction('shell', [command])
              let out = res.data['application/json'][0]
              this.echo(new String(out))
          } catch(e) {
              this.error(new String(e));
          }
      } else {
          this.echo('');
      }
  }, {
      greetings: 'Welcome to Colab Shell',
      name: 'colab_demo',
      height: 250,
      prompt: 'colab > '
  });

61.9.9. gcloud

gcloud info - current environment

import torch print(torch.cuda.get_device_name())

LD_LIBRARY_PATH=/usr/lib64-nvidia watch -n 1 nvidia-smi

!gcloud auth login # Authorize gcloud to access the Cloud Platform with Google user credentials.

connect Google Colab to Google Cloud.

!gcloud compute ssh --zone us-central1-a 'instance-name' -- -L 8888:localhost:8888

61.9.10. gcloud ssh (require billing)

bad: !gcloud config set account account@gmail

!gcloud auth login
!gcloud projects create vfdsgq2345 --enable-cloud-apis --name vfdsgq2345 --set-as-default

Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/vfdsgq2346]. Enabling service [cloudapis.googleapis.com] on project [vfdsgq2346]… Operation "operations/acat.p2-872588642643-8ef11211-5181-47e3-bcd2-383690de7d91" finished successfully. Updated property [core/project] to [vfdsgq2346].

!gcloud config set project 1
!gcloud compute ssh

gcloud compute ssh example-instance –zone=us-central1-a – -vvv -L 80:%INSTANCE%:80

!gcloud compute ssh 10.2.3.4:22 –zone=us-central1-a – -vvv -L 80:localhost:80

61.9.11. api

https://github.com/googlecolab
https://cloud.google.com/sdk/docs/install
binary gcloud

61.9.12. upload and download files

from google.colab import files
files.upload/download()

61.9.13. connect ssh (restricted)

https://medium.com/@ayaka_45434/connect-to-google-colab-using-ssh-bb342e0d0fd2

at relay server:

$ ssh-keygen -t ed25519 -a 256
$ cat .ssh/id_ed25519.pub

at colab:

%%sh
mkdir -p ~/.ssh
echo '<SSH public key of PC>' >> ~/.ssh/authorized_keys
apt update > /dev/null
yes | unminimize > /dev/null
apt install -qq -o=Dpkg::Use-Pty=0 openssh-server pwgen net-tools psmisc pciutils htop neofetch zsh nano byobu > /dev/null
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa > /dev/null
echo ListenAddress 127.0.0.1 >> /etc/ssh/sshd_config
mkdir -p /var/run/sshd
/usr/sbin/sshd

61.9.14. connect ssh (unrestricted)

at colab:

!git clone https://github.com/WassimBenzarti/colab-ssh ; mv colab-ssh cs ; cd cs ; rm -r .git

!git clone –depth=1 https://github.com/openssh/openssh-portable ; mv openssh-portable cs ; cd cs ; rm -r .git ; autoreconf && ./configure && make && make install ; mv /usr/local/sbin/sshd /usr/local/sbin/aav

%%shell a=$(cat <<EOF AcceptEnv LANG LC_ALL LC_COLLATE LC_CTYPE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME LANGUAGE LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_NAME LC_PAPER LC_TELEPHONE AcceptEnv COLORTERM

Port 9090 ListenAddress 127.0.0.1 AllowUsers u

PermitRootLogin no PubkeyAuthentication yes PasswordAuthentication no PermitEmptyPasswords no KbdInteractiveAuthentication no EOF ) echo "$a" > aav.conf ; useradd -m sshd ; ls

!mkdir root.ssh ; chmod 0700 root.ssh ; mv cs/ssh aavc ; ./cs/ssh-keygen -b 4096 -t rsa -f root.ssh/mykey_rsa -q -N "" ; cat root.ssh/mykey_rsa.pub > root.ssh/authorized_keys

!exec /usr/local/sbin/aav -f aav.conf

!cat root.ssh/mykey_rsa.pub > root.ssh/authorized_keys

!./aavc -vvv -p 9090 localhost

61.9.15. Restrictions

disallowed from Colab runtimes:

file hosting, media serving, or other web service offerings not related to interactive compute with Colab
downloading torrents or engaging in peer-to-peer file-sharing
using a remote desktop or SSH
connecting to remote proxies
mining cryptocurrency
running denial-of-service attacks
password cracking
using multiple accounts to work around access or resource usage restrictions
creating deepfakes

61.9.16. cons

GPU/TPU usage is limited
Not the most powerful GPU/TPU setups available
Not the best de-bugging environment
It is hard to work with big data
Have to re-install extra dependencies every new runtime
Google drive: limited to 15 GB of free space with a Gmail id.
you’ll have to (re)install any additional libraries you want to use every time you (re)connect to a Google Colab notebook.

Alternatives:

Kaggle
Azure Notebooks
Amazon SageMaker
Paperspace Gradient
FloydHub

62. Jupyter Notebook

https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Importing Notebooks.html .ipynb

у каждой cell желательно обеспечить идемпотентность

62.1. jupyter [ˈʤuːpɪtə] - акцентом на интерактивности производимых вычислений

https://jupyter.org/
Идея - не рисовать, а отбирать работающие правила
many languages https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
Project Jupyter - nonprofit organization, interactive computing across dozens of programming languages. Free for all to use and released under the liberal terms of the modified BSD license
- Jupyter Notebook -web-based - .ipynb - Jupyter Notebook is MathJax-aware (subset of Tex and LaTeX.)
- Jupyter Hub
- Jupyter Lab - interfaces for all products under the Jupyter ecosystem, редактирование изображений, CSV, JSON, Markdown, PDF, Vega, Vega-Lite
- next-generation version of Jupyter Notebook
- Jupyter Console
- Qt Console

kernels: jupyter kernelspec list

%run -n main.py  - import module

62.2. install

pip3 install nbconvert --user

launch:

cd to folder with .ipynb
jupyter-notebooks # it will open browser

62.3. convert to htmp

ipython nbconvert /home/u2/tmp/Lecture_10_decision_trees.ipynb

62.4. Widgets

62.4.1. install

list https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html

run

pip install ipywidgets –user
jupyter nbextension enable –py widgetsnbextension

62.4.2. usage

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

date_w = widgets.DatePicker(
    description='Pick a Date',
    disabled=False
)

def f(x):
    return x

interact(f, x=date_w) # x - name of f(x) parameter and *type of widget*
interact(f, x=10); # int slider (abbrev)
interact(f, x=True); # bool flag (abbrev)

interact(h, p=5, q=fixed(20)); # q parameter is fixed

62.4.3. widget abbreviation

Checkbox: True or False
Text: 'Hi there'
IntSlider: value or (min,max) or (min,max,step) if integers are passed
FloatSlider: value or (min,max) or (min,max,step) if floats are passed
Dropdown: ['orange','apple'] or `[(‘one’, 1), (‘two’, 2)]

62.4.4. widget return type

widgets.DatePicker: datetime.date

62.4.5. Styling

https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Styling.html

Description

style = {'description_width': 'initial'}
IntSlider(description='A too long description', style=style)

62.5. Hotkeys:

Enter - in cell
Escepe - exit cell
h - hotkeys
Ctrl+Enter/Shift+Enter - run
Tab - code completion
arrow up/down - above/below cell

62.6. emacs (sucks)

org-mode may evaluate code blocks using a Jupyter kernel https://github.com/gregsexton/ob-ipython

jupyter_console, jupyter_client

62.7. other

https://proglib.io/p/analysis-hacks/

62.8. TODO lab

pip install jupyterlab
jupyter lab - http://localhost:8888

63. USΕ CASES

measure time 28.3

63.1. NET

63.1.1. REST request

import urllib.request
import json


API_KEY = 'f670813c14f672c1e197101fd767cbe675933d86'
headers = {'User-agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5',
           'Content-Type': 'application/json',
           'Accept': 'application/json',
           'Authorization': 'Token ' + API_KEY
}

data = '{ "query": "Виктор Иван", "count": 3 }'
req = urllib.request.Request(url='https://suggestions.dadata.ru/suggestions/api/4_1/rs/suggest/fio',
                             headers=headers, data=data.encode())
with urllib.request.urlopen(req) as f:
    r = f.read().decode('utf-8')
    j = json.loads(r)
    print(j['suggestions'][0]["unrestricted_value"])
    print(j['suggestions'][0]["gender"])
    j2 = json.dumps(j, ensure_ascii=False, indent=4)
    print(j2)

63.1.2. email IMAP

import configparser as cp
import cx_Oracle
import datetime
import email
import imaplib
import logging
import os
import re
import requests
import shutil
import smtplib
import zipfile
import sys

from email.header import decode_header
from email.mime.application import MIMEApplication
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.utils import formatdate
from os.path import basename
from requests.auth import HTTPBasicAuth
from sys import exit

def decode_header_fix(subject_list: list) -> str:
    """ decode to string any header after decode_header"""
    sub_list = []
    for subject in subject_list:
        if subject and subject[1]:
            subject = (subject[0].decode(subject[1]))
        elif type(subject[0]) == bytes:
            subject = subject[0].decode('utf-8')
        else:
            subject = subject[0]
        sub_list.append(subject)
    return ''.join(sub_list)


def send_mail(username, password, send_from, send_to, subject,
              text, files=None, server="mx1.rnb.com"):
    assert isinstance(send_to, list)

    msg = MIMEMultipart()
    msg['From'] = send_from
    msg['To'] = COMMASPACE.join(send_to)
    msg['Date'] = formatdate(localtime=True)
    msg['Subject'] = subject

    msg.attach(MIMEText(text))

    for f in files or []:
        with open(f, "rb") as fil:
            part = MIMEApplication(
                fil.read(),
                Name=basename(f)
            )
        # After the file is closed
        part['Content-Disposition'] = 'attachment; filename="%s"' % basename(f)
        msg.attach(part)

    smtp = smtplib.SMTP(server)
    smtp.login(username, password)
    log.debug(u'Отправляю письмо на %s' % send_to)
    smtp.sendmail(send_from, send_to, msg.as_string())
    smtp.close()


def save_attachment(conn: imaplib.IMAP4, emailid: str, outputdir: str, file_pattern: str):
    """ https://docs.python.org/3/library/imaplib.html

    :param conn: connection
    :param emailid:
    :param outputdir:
    :param file_pattern: regex pattern for file name of attachment
    :return:
    """
    try:
        ret, data = conn.fetch(emailid, "(BODY[])")
    except:
        "No new emails to read."
        conn.close_connection()
        exit()
    mail = email.message_from_bytes(data[0][1])
    # print('From:' + mail['From'])
    # print('To:' + mail['To'])
    # print('Date:' + mail['Date'])
    # subject_list = decode_header(mail['Subject'])
    # subject = decode_header_fix(subject_list) # must be: Updating client ICODE RNB_378026
    # print('Subject:' + subject)
    # print('Content:' + str(mail.get_payload()[0]))

    # process_out_reestr(mail)

    if mail.get_content_maintype() != 'multipart':
        return
    for part in mail.walk():
        if part.get_content_maintype() != 'multipart' and part.get('Content-Disposition') is not None:
            filename_list = decode_header(part.get_filename())  # (encoded_string, charset)
            filename = decode_header_fix(filename_list)
            if not re.search(file_pattern, filename):
                continue
            # write attachment
            print("OKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK")
            with open('{}/{}'.format(outputdir, filename), 'wb') as f:
                f.write(part.get_payload(decode=True))


def download_email_attachments(server: str, user: str, password: str, outputdir: str,
                               subject_contains: str, file_pattern: str, days_since=0) \
        -> bool or None:
    """

    :param server:
    :param user:
    :param password:
    :param outputdir:
    :param subject_contains:
    :param file_pattern:
    :param days_since:
    :return:
    """
    date = datetime.datetime.now() - datetime.timedelta(days=days_since)
    # https://docs.python.org/3/library/imaplib.html
    # https://tools.ietf.org/html/rfc3501#page-49
    # SUBJECT <string>
    #          Messages that contain the specified string in the envelope
    #          structure's SUBJECT field
    criteria = '(SENTSINCE "{}" SUBJECT "{}")'.format(date.strftime('%d-%b-%Y'),
                                                      subject_contains)

    try:
        m = imaplib.IMAP4_SSL(server)
        m.login(user, password)
        m.select()
        resp, items = m.search(None, criteria)
        if not items[0]:
            log.debug(u'Нет писем с реестрами в папке ВХОДЯЩИЕ')
            return False
        items = items[0].split()
        for emailid in items:
            save_attachment(m, emailid, outputdir, file_pattern)
            # TODO: change
            # m.store(emailid, '+FLAGS', '\\Seen')
            # m.copy(emailid, 'processed')
            # m.store(emailid, '+FLAGS', '\\Deleted')
        m.close()
        m.logout()
    except imaplib.IMAP4_SSL.error as e:
        print("LOGIN FAILED!!! ", e)
        sys.exit(1)
    return True


if __name__ == '__main__':
    import tempfile
    c = config_load('autocred.conf')
    log = init_logger(logging.INFO, c['storage']['log_path'])  # required by all methods
    #
    # with tempfile.TemporaryDirectory() as tmp:
    #     print(tmp)
    #     res = download_email_attachments(server=c['imap']['host'],
    #                                      user=c['imap']['login'],
    #                                      password=c['imap']['password'],
    #                                      outputdir=tmp, subject_contains='Updating client ICODE RNB_',
    #                                      file_pattern=r'^client_identity_RNB_\d+\.zip\.enc$', days_since=1)
    #     extract_zip_files(tmp)
    #     for x in os.listdir(tmp):
    #         print(x)

    tmp = '/home/u2/Desktop/tmp/tmp2/'
    # res = download_email_attachments(server=c['imap_bistr']['host'],
    #                                  user=c['imap_bistr']['login'],
    #                                  password=c['imap_bistr']['password'],
    #                                  outputdir=tmp,
    #                                  subject_contains='Updating client ICODE',  # 'Updating client ICODE RNB_378026'
    #                                  file_pattern=r'^client_identity_RNB_\d+\.zip\.enc$', days_since=3)

    for filename in os.listdir(tmp):
        print(filename)
        decrypt_file(uri=c['api']['dec_uri'],
                     cert_thumbprint=c['api']['dec_cert_thumbprint'],
                     user=c['api']['user'],
                     passw=c['api']['pass'],
                     filename=os.path.join(tmp, filename))
    for x in os.listdir(tmp):
        print(x)

63.1.3. email DKIM

('DKIM-Signature', 'v=1; a=rsa-sha256; q=dns/txt; c=simple/simple; d=bystrobank.ru\n\t; s=dkim;
h=Message-Id:Content-Type:MIME-Version:From:Date:Subject:To:Sender:\n\tReply-To:Cc:Content-Transfer-Encoding:Content-ID:Content-Description:\n\tResent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:\n\tIn-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:\n\tList-Post:List-Owner:List-Archive;\n\tbh=dDimDD8KIdEx1QkqygEiFeQfyTIgIztxgQu6BtkzQ5o=;
b=hZGPWUFnQ2gGNV4UJ7MyaPJYFL\n\tbB9Csmpg/ukcwQuWBI1NtvILUoviMff4ACkNnhPgD7OV4aGtR5UBOy81tdvY5cQnBFv9Yku9yAf8R\n\t1BV83crKYnhU4GRtw7wD4W64zpZRhX3KZxG8SWissmh+vNEMBlmYXN9FsuLyVKaBbks0DYnR3HA9Q\n\tFV4d8CMC8wLrdmBi/MV0x75Q9GhDhGMc8MPNAleuWabHOT8Bmf7FLHQERHBRYm78i4wDWEFFNv5Ox\n\tuqMEm5iJQeYRnoHkrm5KEEP4DYohb8GgJkfIIZs4dO2oMjJif/2A1JLnmq64KPmoAE3s8lO2Bo2Zq\n\t68tnSdFA==;')

pip3 install dkimpy --user

import dkim
# verify email
    try:
        res = dkim.verify(data[0][1])
    except:
        log.error(u'Invalid signature')
        return
    if not res:
        log.error(u'Invalid signature')
        return
    print('[' + os.path.basename(__file__) + '] isDkimValid = ' + str(res))

    mail = email.message_from_bytes(data[0][1])
    # verify sender domain
    dkim_sig = decode_header(mail['DKIM-Signature'])
    dkim_sig = decode_header_fix(dkim_sig)
    if not re.search(r" d=bystrobank\.ru", dkim_sig):
        return

63.1.4. urllib SOCKS

pip install requests[socks]

import urllib
import socket
import socks
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1",port=8888)
save = socket.socket
socket.socket = socks.socksocket # replace socket with socks
req = urllib.request.Request(url='http://httpbin.org/ip')
urllib.request.urlopen(req).read() # default request

63.2. LISTS

63.2.1. all has one value

list.count('value') == len(list)

63.2.2. 2D list to 1D dict or list

[j for sub in [[1,2,3],[1,2],[1,4,5,6,7]] for j in sub]

{j for sub in [[1,2,3],[1,2],[1,4,5,6,7]] for j in sub}

63.2.3. list to string

' '.join(w for w in a)

63.2.4. replace one with two

l[pos:pos+1] = ('a', 'b')

63.2.5. remove elements

filter

self.contours = list(filter(lambda a: a is not None, self.contours))

new list

a = [item for item in a if ...]

iterate over copy

for i, x in enumerate(lis[:]):
  del lis[i]

63.2.6. average

[np.average((x[0], x[1])) for x in zip([1,2,3],[1,2,3])]

63.2.7. [1, -2, 3, -4, 5]

>>> [(x % 2 -0.5)*2*x for x in range(1,10)]
[1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0, 9.0]

63.2.8. ZIP массивов с разной длинной

import itertools
z= itertools.zip_longest(arr1,arr2,arr3)
flat_list=[]
for x in z:
    subflat=[]
    for subl in x:
        if subl != None:
            subflat.append(subl[0])
            subflat.append(subl[1])
            subflat.append(subl[1])
        else:
            subflat.append('')
            subflat.append('')
    flat_list.append(subflat)

63.2.9. Shuffle two lists

z = list(zip(self.x, self.y))
z = random.shuffle(z)
self.x, self.y = zip(*z)

63.2.10. list of dictionaries

search and encode

def one_h_str_col(dicts: list, column: str):
    c = set([x[column] for x in dicts])  # unique
    c = list(c)  # .index
    nb_classes = len(c)
    targets = np.arange(nb_classes)
    one_hot_targets = np.eye(nb_classes)[targets]
    for i, x in enumerate(dicts):
        x[column] = list(one_hot_targets[c.index(x[column])])
    return dicts


def one_h_date_col(dicts: list, column: str):
    for i, x in enumerate(dicts):
        d: date = x[column]
        x[column] = d.year
    return dicts


def one_h(dicts: list):
    for col in dicts[0].keys():
        lst = set([x[col] for x in dicts])
        if all(isinstance(x, (str, bytes)) for x in lst):
            dicts = one_h_str_col(dicts, col)
        if all(isinstance(x, date) for x in lst):
            dicts = one_h_date_col(dicts, col)
    return dicts

dicts = [
{ "name": "Mark", "age": 5 },
{ "name": "Tom", "age": 10 },
{ "name": "Pam", "age": 7 },
]

c = set([x['name'] for x in dicts]) # unique
c = list(c)  # .index

for i, x in enumerate(dicts):
  x['name'] = c.index(x)

separate labels from matrix

matrix = [list(x.values()) for x in dicts]
labels = dicts[0].keys()

63.2.11. closest in list

alph = [1,2,5,7]
source = [1,2,3,6] # 3, 6 replace
target = source[:]
for i, s in enumerate(source
  if s not in alph:
    distance = [(abs(x-s), x) for x in alph
    res = min(distance, key=lambda x: x[0])
    target[i] = res[1]

63.2.12. TIMΕ SEQUENCE

smooth

mean_ver1 = pandas.Series(mean_ver1).rolling(window=5).mean()

63.2.13. split list in chunks

our_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
chunk_size = 3
chunked_list = [our_list[i:i+chunk_size] for i in range(0, len(our_list), chunk_size)]
print(chunked_list)

63.3. FILES

os.path.join('/home','user') - /home/user
os.listdir('/home/user') -> list of file_names - files and directories
os.path.isdir/isfile() -> True False
os.walk() - subderictories = [(folder_path, list_folders, list_files), … ]
extension = os.path.splitext(filename)[1][1:]

Extract from subolders: find . -mindepth 2 -type f -print -exec mv {} . \;

63.3.1. Read JSON

import codecs
fileObj =codecs.open("provodki_1000.json", encoding='utf-8', mode='r')
text = fileObj.read()
fileObj.close()
data = json.loads(text)

# or
import json
with open('test_data.txt', 'r') as myfile:
    data=myfile.read()
obj = json.loads(data)

data = json.loads(text)

63.3.2. CSV

https://docs.python.org/3.6/library/csv.html

array to CSV file for Excell

wtr = csv.writer(open ('out.csv', 'w'), delimiter=';', lineterminator='\n')
for x in arr :
    wtr.writerow(x)

read CSV and write

https://docs.python.org/3.6/library/collections.html#collections.OrderedDict

import csv

p = '/home/u2/Downloads/BANE_191211_191223.csv'

with open(p, 'r') as f:
    reader = csv.reader(f, delimiter=';', quoting=csv.QUOTE_NONE)
    for row in reader:

63.3.3. read file

Whole:

import codecs
fileObj =codecs.open("provodki_1000.json", encoding='utf-8', mode='r')
text = fileObj.read()
fileObj.close()

Line by line:

with open(fname) as f:    content = f.readline()

go to the begining of the file

file.seek(0)

read whole text file:

with open(fname) as f:    content = f.readlines()
with open(fname) as f: temp = f.read().splitlines()

63.3.4. Export to Excell

https://docs.python.org/3.6/library/csv.html

import csv
wtr = csv.writer(open('out.csv', 'w'), delimiter=';', lineterminator='\n')
wtr.writerows(flat_list)

63.3.5. NameError: name 'A' is not defined

try:
    file.close()
except NameError:

63.3.6. rename files (list directory)

import os
from shutil import copyfile

sd = '/mnt/hit4/hit4user/kaggle/abstraction-and-reasoning-challenge/training/'

td = '/mnt/hit4/hit4user/kaggle/abc/training/'
dirFiles = os.listdir(sd)
dirFiles.sort(key=lambda f: int(f[:-5], base=16))
for i, x in enumerate(dirFiles):
    src = os.path.join(sd,x)
    dst = os.path.join(td,str(i))
    copyfile(src, dst)

63.3.7. current directory

import sys, os
os.path.abspath(sys.argv[0])

63.4. STRINGS

63.4.1. String comparision

https://stackabuse.com/comparing-strings-using-python/

== compares two variables based on their actual value
is operator compares two variables based on the object id

Rule: use == when comparing immutable types (like ints) and is when comparing objects.

a.lower() == b.lower()

difflib.SequenceMatcher - gestalt pattern matching

from difflib import SequenceMatcher
m = SequenceMatcher(None, "NEW YORK METS", "NEW YORK MEATS")
m.ratio() ⇒ 0.962962962963
# disadvantage:
fuzz.ratio("YANKEES", "NEW YORK YANKEES") ⇒ 60 # same team
fuzz.ratio("NEW YORK METS", "NEW YORK YANKEES") ⇒ 75 # different teams

# fix: best partial:
from difflib import SequenceMatcher

def a(s1,s2):
    if len(s1) <= len(s2):
        shorter = s1
        longer = s2
    else:
        shorter = s2
        longer = s1

    m = SequenceMatcher(None, shorter, longer)
    blocks = m.get_matching_blocks()
    scores = []
    for block in blocks:
        long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0
        long_end = long_start + len(shorter)
        long_substr = longer[long_start:long_end]

        m2 = SequenceMatcher(None, shorter, long_substr)
        r = m2.ratio()
        if r > .995:
            return 100
        else:
            scores.append(r)

    return int(round(100 * max(scores)))

s1="MEATS"
s2="NEW YORK MEATS"


print(a("asd", "123asd")) # 100
print(a("asd", "asd123")) # 100

https://en.wikipedia.org/wiki/Levenshtein_distance

def levenshtein(s: str, t: str) -> int:
    """

    :param s:
    :param t:
    :return: 0 - len(s)
    """
    if s == "":
        return len(t)
    if t == "":
        return len(s)
    cost = 0 if s[-1] == t[-1] else 1

    i1 = (s[:-1], t)
    if not i1 in memo:
        memo[i1] = levenshtein(*i1)
    i2 = (s, t[:-1])
    if not i2 in memo:
        memo[i2] = levenshtein(*i2)
    i3 = (s[:-1], t[:-1])
    if not i3 in memo:
        memo[i3] = levenshtein(*i3)
    res = min([memo[i1] + 1, memo[i2] + 1, memo[i3] + cost])

    return res

hamming distance

import hashlib

def hamming_distance(chaine1, chaine2):
    return sum(c1 != c2 for c1, c2 in zip(chaine1, chaine2))

def hamming_distance2(chaine1, chaine2):
    return len(list(filter(lambda x : ord(x[0])^ord(x[1]), zip(chaine1, chaine2))))
print(hamming_distance("chaine1", "chaine2"))

print(hamming_distance2("chaine1", "chaine2"))

63.4.2. Remove whitespaces

line = " ".join(line.split()) # resplit

63.4.3. Unicode

'\u2116'.encode("unicode_escape")
- b'\\u2116'
print('№'.encode("unicode_escape"))
- b'\\u2116'
print('\u2116'.encode("utf-8")) # sometimes do wrong
- b'\xe2\x84\x96'
print(b'\xe2\x84\x96'.decode('utf-8'))
- №
print('\u2116'.encode("utf-8").decode('utf-8'))
- №

terms
- code points, first two characters are always "U+", hexadecimal. At least 4 hexadecimal digits are shown, prepended with leading zeros as needed. ex: U+00F7
- BOM - magic number at the start of a text
  - UTF-8 byte sequence EF BB BF, permits the BOM in UTF-8, but does not require or recommend its use.
  - Not using a BOM allows text to be backwards-compatible with software designed for extended ASCII.
  - In UTF-16, a BOM (U+FEFF), byte sequence FE FF
- UTF-8 Encoding or Hex UTF-8 - hex representation of encoded 1-4 bytes.
Encoding formats: UTF-8, UTF-16, GB18030, UTF-32
utf-8
- ASCII-compatible
- 1-4 bytes for each code point
UTF-16
- ASCII-compatible
GB18030

utf-8

First code point	Last code point	Byte 1	Byte 2	Byte 3	Byte 4
U+0000	U+007F	0xxxxxxx
U+0080	U+07FF	110xxxxx	10xxxxxx
U+0800	U+FFFF	1110xxxx	10xxxxxx	10xxxxxx
U+10000	U+10FFFF	11110xxx	10xxxxxx	10xxxxxx	10xxxxxx

63.4.4. To find all the repeating substring in a given string

https://stackoverflow.com/questions/41077268/python-find-repeated-substring-in-string

You can do it by repeating the substring a certain number of times and testing if it is equal to the original string.

def levenshtein(s: str, t: str) -> int:
    """

    :param s:
    :param t:
    :return: 0 - len(s)
    """
    if s == "":
        return len(t)
    if t == "":
        return len(s)
    cost = 0 if s[-1] == t[-1] else 1

    i1 = (s[:-1], t)
    if not i1 in memo:
        memo[i1] = levenshtein(*i1)
    i2 = (s, t[:-1])
    if not i2 in memo:
        memo[i2] = levenshtein(*i2)
    i3 = (s[:-1], t[:-1])
    if not i3 in memo:
        memo[i3] = levenshtein(*i3)
    res = min([memo[i1] + 1, memo[i2] + 1, memo[i3] + cost])

    return res


c = '03105591400310559140031055914003105591400310559140031055914003105591400310559140'
c = '0310559140031055914031055914003105591400310591400310559140031055910030559140'
a=[]
for j in range(10):
    for i in range(7):
        if (i*10+10+j) <= len(c):
            a.append(c[i*10+j:i*10+10+j])
v = {x: a.count(x) for x in a if a.count(x) >2}
#for k in v.keys():
#    print(k, levenshtein(k*8,c)
re = {k: levenshtein(k*8,c) for k in v.keys()}
print(sorted(re, key=re.__getitem__)[0]) # asc
0310559140 4
3105591400 6
1055914003 8
0559140031 10
5591400310 12
5914003105 14
9140031055 12
1400310559 10
4003105591 8
0031055914 6
'3105591400310559140031055914003105591400310559140031055914003105591400310559140'
3105591400 1
1055914003 3
0559140031 5
5591400310 7
5914003105 9
9140031055 9
1400310559 7
4003105591 5
0031055914 3
0310559140 1 - THIS

63.4.5. first substring

str.find
by regex:

m = re.search("[0-9]*")
  if m:
    num = d[m.start():m.end()]

63.5. DICT

add

d1.update(d2) # d1 = d1+d2

find max value

import operator
max(d1.items(), key=operator.itemgetter(1))[0]

for

for key in dict:
for key, value in dict.items():

sorted dict

abb_sel_diff_middle[wind] = sum/len(abb_sel_diff[wind])
c = sorted(abb_sel_diff_middle.items(), key=lambda kv: kv[1], reverse=True) #dsc
numbers = {'first': 2, 'second': 1, 'third': 3, 'Fourth': 4}
sorted(numbers, key=numbers.__getitem__)
>>['second', 'first', 'third', 'Fourth']

merge two dicts

z={**x, **y}

63.5.1. del

loop with clone

for k,v in list(d.items()):
  if v is bad:
     del d[k]
# or
{k,v for k,v in list(d.items()) if v is not bad}

filter

self.contours = list(filter(lambda a: a is not None, self.contours))

63.6. argparse: command line arguments

63.6.1. terms

positional arguments - arguments without options (main.py input_file.txt)
options that accept values (–file a.txt)
on/off flags - options without any vaues (–overwrite)

63.6.2. usage

import sys
>>> print(sys.argv)

import argparse



def main(args):
    args.batch_size

if __name__ == '__main__':
parser = argparse.ArgumentParser()
    parser.add_argument("--data_dir", help="data directory", default='./data')
    parser.add_argument("--default_settings", help="use default settings", type=bool, default=True)
    parser.add_argument("--combine_train_val", help="combine the training and validation sets for testing", type=bool,
                        default=False)
    main(parser.parse_args())

63.6.3. optional positional argument

parser.add_argument('bar', nargs='?', default='d')

63.7. way to terminate

sys.exit()

63.8. JSON

may be array or object

замана " на \"
замена \ на

63.9. NN EQUAL QUANTITY FROM SAMPLES

    lim = max(count.values())*2 # limit for all groups
    print(count.values())
    print('max', max)

    for _, v in count.items(): # v - quantity
        c = 0 # current quantity
        for _ in range(v):  # i - v-1
            r = round(lim / v) #
            if c < lim + r:
                diff = 0
                if (c + r) > lim:
                    diff = c + r - lim
                #create: r - diff
                c += r - diff # may be removed
        print(c)

# Or in class -------------
import math

class Counter:
    def __init__(self, limit):  # , multiplyer):
        self.lim: int = limit  # int(max(amounts) * multiplyer)
        print("Counter limit:", self.lim)

    def new_count(self, one_amount):
        self.c: int = 0  # done counter
        self.r: int = math.ceil(self.lim / one_amount)  # multiplyer
        # x + y = one_amount
        # x* r + y = lim
        # y = one_amount - x  # without duplicates
        # x*r + one_amount - x = lim  # with duplicates
        # x*(r - 1) = lim - one_amount
        # x = (lim - one_amount) / (r - 1)
        if self.r == 1:
            self.wd = self.lim
        else:
            self.wd = (self.lim - one_amount) / (self.r - 1)    # take duplicates
            self.wd = self.wd * self.r

    def how_many_now(self) -> int:
        """ called one_amount times
        :return how many times repeat this sample to equal this one_amount to others
        """
        diff: int = 0
        if self.c > self.wd:
            r: int = 1
        else:
            r: int = self.r
        if (self.c + r) > self.lim:
            diff = self.c + r - self.lim  # last return

        self.c += r - diff  # update counter
        return int(r - diff)

counts = [20,30,10,7,100]
multiplyer = 2
counter = Counter(counts, multiplyer)
for v in counts:  # v - quantity
    counter.new_count(v)
    c = 0
    for _ in range(v):  # i - v-1 # one item
        c += counter.how_many_now()
    print(c)

63.10. most common ellement

def most_common(lst):
    return max(set(lst), key=lst.count)

mc = most_common([round(a, 1) for a in degrees if abs(a) != 0])
filtered_degrees = []
for a in degrees:
    if round(a, 1) == mc:
       filtered_degrees.append(a)
med_degree = float(np.median(filtered_degrees))


# max char
s3 = 'BEBBBB'
s3 = {x: s3.count(x) for x in s3}
mc = sorted(s3.values())[-1]
s3 = [key for key, value in s3.items() if value == mc][0]  # most common

63.11. print numpers

n=123123123412
print(f"{n:,}")

>>> 123,123,123,412

63.12. SCALE

# to range 0 1
def scaler_simple(data: np.array) -> np.array:
    """ in range (0,1)

    :param data: one dimensions
    :return:(0,1)
    """
    data_min = np.nanmin(data)
    data_max = np.nanmax(data)
    data = (data - data_min) / (data_max - data_min)
    return data

# -(0 - 5) / 5
# to range -1 1
def scaler_simple(data: np.array) -> np.array:
    """ in range (0,1)

    :param data: one dimensions
    :return:(0,1)
    """
    data_min = np.nanmin(data)
    data_max = np.nanmax(data)
    data =(data_max/2 - data) / (data_max - data_min) / 2
    return data

# (0,1) to (-1,1)
data = (0.5 - data) / 0.5
# (-1,1) to (0,1)
data = (1 - data) / 2

def my_scaler(data: np.array) -> np.array:
    """ data close to 0 will not add much value to the learning process

    :param data: two dimensions 0 - time, 1 - prices
    :return:
    """

    # data = scaler(data, axis=0)
    smoothing_window_size = data.shape[0] // 2  # for 10000 - 4
    dl = []
    for di in range(0, len(data), smoothing_window_size):
        window = data[di:di + smoothing_window_size]
        # print(window.shape)
        window = scaler(window, axis=1)
        # print(window[0], window[-1])
        dl.append(window)  # last line will be shorter

    return np.concatenate(dl)

63.13. smoth

def savitzky_golay(y, window_size, order, deriv=0, rate=1):

    import numpy as np
    from math import factorial

    try:
        window_size = np.abs(np.int(window_size))
        order = np.abs(np.int(order))
    except ValueError as msg:
        raise ValueError("window_size and order have to be of type int:", msg)
    if window_size % 2 != 1 or window_size < 1:
        raise TypeError("window_size size must be a positive odd number")
    if window_size < order + 2:
        raise TypeError("window_size is too small for the polynomials order")
    order_range = range(order+1)
    half_window = (window_size -1) // 2
    # precompute coefficients
    b = np.mat([[k**i for i in order_range] for k in range(-half_window, half_window+1)])
    m = np.linalg.pinv(b).A[deriv] * rate**deriv * factorial(deriv)
    # pad the signal at the extremes with
    # values taken from the signal itself
    firstvals = y[0] - np.abs(y[1:half_window+1][::-1] - y[0])
    lastvals = y[-1] + np.abs(y[-half_window-1:-1][::-1] - y[-1])
    y = np.concatenate((firstvals, y, lastvals))
    return np.convolve(m[::-1], y, mode='valid')

63.14. one-hot encoding

63.14.1. we have [1,3] [1,2,3,4], [3,4] -> numbers

import numpy as np
nb_classes = 4
targets = np.array([[2, 3, 4, 0]]).reshape(-1)
one_hot_targets = np.eye(nb_classes)[targets]
res:int = sum([x*(2**i) for i, x in enumerate(sum(one_hot_targets))]) # from binary to integer

63.14.2. column of strings

def one_h_str_col(col: np.array, name: str):
    c = list(set(col))  # unique
    print(name, c)  # encoding
    res_col = []
    for x in col:
        ind = c.index(x)
        res_col.append(ind)
    return np.array(res_col)

63.15. binary encoding

            s_ids = []
            for service_id, cost in cursor1.fetchall():  # service_id = None, 1,2,3,4
                service_id = 0 if service_id is None else int(service_id)
                s_ids.append(int(service_id))
            targets = np.array(s_ids).reshape(-1)
            s_id = 0
            if targets:
                one_hot_targets = np.eye(6)[targets]  # 5 classes
                s_id: int = sum([x * (2 ** i) for i, x in enumerate(sum(one_hot_targets))])  # from binary to integer

63.16. map encoding

df['`condition`'] = df['`condition`'].map({'new': 0, 'uses': 1})

63.17. Accuracy

import numpy as np

Accuracy = (TP+TN)/(TP+TN+FP+FN):

print("%f" % (np.round(ypred2) != labels_test).mean())

Precision = (TP) / (TP+FP)

63.18. garbage collect

del train, test; gc.collect()

63.19. Class loop for member varibles

for x in vars(instance): # string names
   v = vars(e)[x]  # varible

63.20. filter special characters

print("asd")
import re
def remove_special_characters(character):
    if character.isalnum() or character == ' ':
        return True
    else:
        return False
text = 'datagy -- is. great!'
new_text = ''.join(filter(remove_special_characters, text))
print(new_text)

63.21. measure time

import time
start_time = time.time()
main()
print("--- %s seconds ---" % (time.time() - start_time))

63.22. primes in interval

#!/usr/bin/python
import sys
m = 2
n = 10
primes = [i for i in range(m,n) if all(i%j !=0 for j in range(2,int(i**0.5) + 1))]
print(primes)

[2, 3, 5, 7]

63.23. unicode characters in interval

emacs character info: C-x =

import sys
a = 945
b = 961
for i in range(a,b + 1):
    print(" ".join([str(i)," ",chr(i)]))

64. Flask

Flask and Quart built on Werkzeug and uses Jinja for templating.
Flask wraps Werkzeug, allowing it to take care of the WSGI intricacies while also offering extra structure and patterns for creating powerful applications.
Quart — an async reimplementation of flask

Flask will never have a database layer. Flask itself just bridges to Werkzeug to implement a proper WSGI application and to Jinja2 to handle templating. It also binds to a few common standard library packages such as logging. Everything else is up for extensions.

64.1. terms

view: view function is the code you write to respond to requests to your application
Blueprints: way to organize a group of related views and other code. Flask associates view functions with blueprints when dispatching requests and generating URLs.

64.2. components

Jinja: template engine https://jinja.palletsprojects.com/
Werkzeug: WSGI toolkit https://werkzeug.palletsprojects.com/
Click: CLI toolkit https://click.palletsprojects.com/
MarkupSafe: escapes characters so it is safe to use in HTML and XML https://markupsafe.palletsprojects.com/
ItsDangerous: safe data serialization library, store the session of a Flask application in a cookie without allowing users to tamper with the session contents. https://itsdangerous.palletsprojects.com/
importlib-metadata: import at middle of execution for optional module dotenv.
zipp: ?

64.3. static files and debugging console

64.3.1. get URL

from flask import url_for
from flask import redirect
@app.route("/")
def hell():
    return redirect(url_for('static', filename='style.css'))

64.3.2. path and console

default:

in localhost:8080/console
- >>> print(app.static_folder)
  - /home/u/static
- >>> print(app.static_url_path)
  - /static
- >>> print(app.template_folder)
  - templates

if we set: app = Flask(static_folder='test')

>>> print(app.static_folder)
/home/u/test
>>> print(app.static_url_path)
/test

app = Flask(__name__, template_folder='./',
            static_url_path='/static',
            static_folder='/home/u/sources/documents_recognition_service/docker/worker/code/test'
            )

64.4. start, run

ways to run:

64.4.1. start $flask run (recommended)

export FLASK_RUN_debug=false
export FLASK_RUN_HOST=localhost FLASK_RUN_PORT=8080 ; flask --app main run --no-debug
export FLASK_APP=main
flask --app main run --debug

FLASK_COMMAND_OPTION - pattern for all options

FLASK_APP

print(app.config) # to get all configuration variables in app

64.4.2. start app.run()

app.run() or flask run

development web server

use gunicorn or uWSGI. production deployment

app.run()

host – the hostname to listen on.
port – the port of the web server.
debug – if given, enable or disable debug mode. automatically reload if code changes, and will show an interactive debugger in the browser if an error occurs during a request
load_dotenv – load the nearest .env and .flaskenv files to set environment variables.
use_reloader – should the server automatically restart the python process if modules were changed?
use_debugger – should the werkzeug debugging system be used?
use_evalex – should the exception evaluation feature be enabled?
extra_files – a list of files the reloader should watch additionally to the modules.
reloader_interval – the interval for the reloader in seconds.
reloader_type – the type of reloader to use.
threaded – should the process handle each request in a separate thread?
processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.
passthrough_errors – set this to True to disable the error catching.
ssl_context – an SSL context for the connection.

64.4.3. links

64.5. Quart

# save this as app.py
from quart import Quart, request
from markupsafe import escape

app = Quart(__name__)

@app.get("/")
async def hello():
    name = request.args.get("name", "World")
    return f"Hello, {escape(name)}!"
# $ quart run
# * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

64.6. GET

64.6.1. variables

string (default) accepts any text without a slash
int accepts positive integers
float accepts positive floating point values
path like string but also accepts slashes
uuid accepts UUID strings

@app.route('/post/<int:post_id>')
def show_post(post_id):
    # show the post with the given id, the id is an integer
    return f'Post {post_id}'

@app.route('/path/<path:subpath>')
def show_subpath(subpath):
    # show the subpath after /path/
    return f'Subpath {escape(subpath)}'

64.6.2. parameters ?key=value

from flask import request
searchword = request.args.get('key', '')

64.7. app.route

64.8. gentoo dependencies

dev-python/asgiref - Asynchronous Server Gateway Interface - calling convention for web servers to forward requests to web applications or frameworks written in the Python
dev-python/blinker - fast dispatching system, to subscribe to events
dev-python/click - creating beautiful command line interfaces
dev-python/gpep517 - gentoo
dev-python/importlib_metadata - gentoo
dev-python/itsdangerous - helpers to pass data to untrusted environments and to get it back safe and sound
- https://github.com/pallets/itsdangerous/
dev-python/jinja - template engine for Python
dev-python/pallets-sphinx-themes - ? themes for documentation
dev-python/pypy3 - fast, compliant alternative implementation of the Python (4.5 times faster than CPython)
dev-python/pytest - Simple powerful testing with Python - detailed assertion introspection
dev-python/setuptools - Easily download, build, install, upgrade, and uninstall Python packages
dev-python/sphinx - Python documentation generator
dev-python/sphinx-issues
dev-python/sphinx-tabs
dev-python/sphinxcontrib-log_cabinet
dev-python/werkzeug - Collection of various utilities for WSGI applications
dev-python/wheel - A built-package format for Python

64.9. blueprints

64.10. Hello world

import flask
from flask import Flask
from flask import json, Response, redirect, url_for
from markupsafe import escape


def create_app(test=False) -> Flask:
    app = Flask(__name__, template_folder='./', static_folder='./')
    if test:
        pass

    @app.route("/predict", methods=["POST"])
    def predict():
        data = {"success": False}

        if flask.request.method != "POST":
            json_string = json.dumps(data, ensure_ascii=False)
            return Response(json_string, content_type="application/json; charset=utf-8")

    @app.route("/<name>")
    def hello(name):
        return f"Hello, {escape(name)}!"

    @app.route('/', methods=['GET', 'POST'])
    def index():
        return redirect(url_for('transcribe'))

    return app


if __name__ == "__main__":
    app = create_app()
    app.run(debug=False)

64.11. curl

https://gist.github.com/subfuzion/08c5d85437d5d4f00e58

one string

application/x-www-form-urlencoded is the default:

curl -d "param1=value1&param2=value2" -X POST http://localhost:3000/data

explicit:

curl -d "param1=value1&param2=value2" -H "Content-Type: application/x-www-form-urlencoded" -X POST http://localhost:3000/dat

64.12. response object

default return:

string => 200 OK status code and a text/html mimetype
dict or list => jsonify() is called to produce a response
iterator or generator returning strings or bytes => streaming response
(response, status), (response, headers), or (response, status, headers)
- headers : list or dictionary
other - assume the return is a WSGI application and convert that into a response object.

make_response:

from flask import make_response

@app.route('/')
def index():
    resp = make_response(render_template(...))
    resp.set_cookie('username', 'the username')
    return resp

https://flask.palletsprojects.com/en/2.2.x/quickstart/#about-responses

64.13. request object

from flask import request

64.13.1. get all values

for x in dir(request):
    print(x, getattr(request, x))

https://flask.palletsprojects.com/en/2.3.x/api/#flask.Request

64.14. Jinja templates

Jinja template library to render templates, located at 64.3.2

autoescape any data that is rendered in HTML templates - such as < and > will be escaped with safe value
{{ and }} - for output. a single trailing newline is stripped if present, other whitespace (spaces, tabs, newlines etc.) is returned unchanged
- {{ name|striptags|title }} - equal to (title(striptags(name)))
{% and %} - control flow, and other Statements
- {%+ if something %}yay{% endif %} or {% if something +%}yay{% endif %} - disabled block with +
- {%- if something %}yay{% endif %} - the whitespaces before or after that block will be removed. used for {{ }} also
{# … #} for Comments not included in the template output
# for item in seq - line stiment, equivalent to {% for item in seq %}

common for {{}}

url_for('static', filename='style.css')

join paths:

{{path_join('pillar', 'device1.sls'}}

common for {%%}

{% if True %} yay {% endif %}
{% raw %} {% {% {% {% endraw %}
{% for user in users %} {{user.a}} {% endfor %}
{% include 'header.html' %}

64.14.1. own filters:

# 1 way
@app.template_filter('reverse')
def reverse_filter(s):
    return s[::-1]

# 2 way
def reverse_filter(s):
    return s[::-1]
app.jinja_env.filters['reverse'] = reverse_filter

app.jinja_env.filters['path_join'] = os.path.join
# usage: {{ path | path_join('..') }}

64.14.2. links

https://jinja.palletsprojects.com/templates/

64.15. security

from markupsafe import escape; return f"Hello, {escape(name)}!"

werkzeug.secure_filename()

64.16. my projects

64.16.1. testing1

from main import app
from flask.testing import FlaskClient
from flask import Response
from pathlib import Path
import json
import  logging
# -- enable app.logger.debug()
app.logger.setLevel(logging.DEBUG)

app.testing = True # propaget excetions to here, or it will return 500 status only



client: FlaskClient
with app.test_client() as client:
    # -- get
    r: Response = client.get('/audio_captcha', follow_redirects=True)
    assert r.status_code == 200
    # the same:
    r: Response = client.get('/get' ,query_string = {'id':str('123')})
    r: Response = client.get('/get?id=123')
    # print(r.status_code)
    # -- post
    r: Response = client.post('/audio_captcha', data={
        'file': Path('/home/u2/h4/PycharmProjects/captcha_fssp/929014e341a0457f5a90a909b0a51c40.wav').open('rb')}
    )
    assert r.status_code == 200
    print(json.loads(r.data))


with app.test_request_context():
    print(url_for('index'))
    print(url_for('login'))
    print(url_for('login', next='/'))
    print(url_for('profile', username='John Doe'))

# /
# /login
# /login?next=/
# /user/John%20Doe

64.16.2. testing2

from main import app
from flask.testing import FlaskClient
from flask import Response
from pathlib import Path
app.testing = True
client: FlaskClient
import json


with app.test_client() as client:
    # r: Response = client.get('/speech_ru')
    # assert r.status_code == 200
    # print(r.status_code)

    r: Response = client.post('/speech_ru', data={
        'file': Path('/home/u2/h4/PycharmProjects/captcha_fssp/929014e341a0457f5a90a909b0a51c40.wav').open('rb')}
    )
    assert r.status_code == 200
    print(json.loads(r.data))

64.16.3. file storage

64.17. Flask-2.2.2 hashes

MarkupSafe==2.1.1 \
  --hash=sha256:7f91197cc9e48f989d12e4e6fbc46495c446636dfc81b9ccf50bb0ec74b91d4b

Jinja2==3.1.2 \
  --hash=sha256:31351a702a408a9e7595a8fc6150fc3f43bb6bf7e319770cbc0db9df9437e852

Werkzeug==2.2.2 \
  --hash=sha256:7ea2d48322cc7c0f8b3a215ed73eabd7b5d75d0b50e31ab006286ccff9e00b8f

click==8.1.3 \
  --hash=sha256:7682dc8afb30297001674575ea00d1814d808d6a36af415a82bd481d37ba7b8e

itsdangerous==2.1.2 \
  --hash=sha256:5dbbc68b317e5e42f327f9021763545dc3fc3bfe22e6deb96aaf1fc38874156a

importlib_metadata==5.0.0 \
  --hash=sha256:da31db32b304314d044d3c12c79bd59e307889b287ad12ff387b3500835fc2ab

zipp==3.8.1 \
  --hash=sha256:05b45f1ee8f807d0cc928485ca40a07cb491cf092ff587c0df9cb1fd154848d2
Flask==2.2.2 \
 --hash=sha256:642c450d19c4ad482f96729bd2a8f6d32554aa1e231f4f6b4e7e5264b16cca2b

64.18. flask-restful

flask-restful - complex API at the top of Flask API ( sucks)
flask-apispec inspired by Flask-RESTful and Flask-RESTplus, but attempts to provide similar functionality with greater flexibility and less code

?? https://github.com/mgorny/flask-api

marshal_with - declare serialization transformation for response https://flask-restful.readthedocs.io/en/latest/quickstart.html

64.19. example

from flask_restful import fields, marshal_with

resource_fields = {
    'task':   fields.String,
    'uri':    fields.Url('todo_ep')
}

class TodoDao(object):
    def __init__(self, todo_id, task):
        self.todo_id = todo_id
        self.task = task

        # This field will not be sent in the response
        self.status = 'active'


parser = reqparse.RequestParser()
parser.add_argument('task', type=str, help='Rate to charge for this resource')
parser.add_argument('picture', type=werkzeug.datastructures.FileStorage, required=True, location='files')


class Todo(Resource):
    @marshal_with(resource_fields)
    def get(self, todo_id):
        args = parser.parse_args()
        task = {'task': args['task']}
        file = args['file']
        file.save("your_file_name.jpg")
        if something:
            abort(404, message="Todo oesn't exist")
        return TodoDao(todo_id='my_todo', task='Remember the milk')

api.add_resource(Todo, '/todos/<todo_id>')

if __name__ == '__main__':
    app.run(debug=True)

64.19.1. image

64.20. swagger

flask_restx - same API as flask-restful but with Swagger autogeneration

flask_restx.reqparse.RequestParser.add_argument

64.21. werkzeug

https://werkzeug.palletsprojects.com/
/usr/lib/python3.11/site-packages/werkzeug

64.22. debug

run(debug=True) - create two applications
localhost:8080/console
- >> app.url_map
- >> print(app.static_folder)

64.23. test

from flask.testing import FlaskClient
from flask import Response

from micro_file_server.__main__ import app


def test_main():
    app.testing = True
    with app.test_client() as client:
        client: FlaskClient
        r: Response = client.get('/')
        assert r.status_code == 200

64.24. production

built-in WSGI in Flask

not handle more than one request at a time by default.
If you leave debug mode on and an error pops up, it opens up a shell that allows for arbitrary code to be executed on your server

pdoction WSGI (Web Server Gateway Interface)

Gunicorn
Waitress
mod_wsgi
uWSGI
gevent
eventlet
ASGI

links

64.25. vulnerabilities

https://github.com/lokori/flask-vuln

64.26. USECASES

get data https://stackoverflow.com/questions/10434599/how-to-get-data-received-in-flask-request
app.config['JSON_AS_ASCII'] = False # disabling ASCII-safe encoding opens the door for issues with U+2028 and U+2029 separators in the data to break Javascript interpolation or JSONP APIs http://timelessrepo.com/json-isnt-a-javascript-subset

Для возвращаемого значения создается

Response 200 OK, with the string as response body, text/html mimetype
(response, status, headers) or (response, headers)

64.26.1. check file exist

from flask import Flask
from flask import render_template
import os
app = Flask(__name__)
@app.route("/")
def main():
    app.logger.debug(os.path.exists(os.path.join(app.static_folder, 'staticimage.png')))
    app.logger.debug(os.path.exists(os.path.join(app.template_folder, 'index.html')))
    return render_template('index.html')

64.26.2. call POST method

request.files = {'file': open('/home/u/a.html', 'rb')}
request.method = 'POST'
r = upload()
# ('{"id": "35f190f6aa854b6c9bb0c64e601c0eda"}', 200, {'Content-Type': 'application/json'})

64.26.3. call GET method with arguments

request.args = {'id': rid}
r = get()
app.logger.debug("r " + json.dumps(json.loads(r[0]), indent=4))

64.26.4. print headers

from flask import Flask
print(__name__)
app = Flask(__name__, template_folder='./', static_folder='./')

from flask import render_template
from flask import abort, redirect, url_for
from flask import request
from werkzeug.utils import secure_filename


@app.route("/")
def hell():
    # return render_template('a.html')
    return ''.join([f"<br> {x[0]}: {x[1]}\n" for x in request.headers])

if __name__ == "__main__":
    print("start")
    app.run(host='0.0.0.0', port=80, debug=False)

64.26.5. TLS server

generate CSR (Creating the Server Certificate) used by CA to generate SSL

rm server.key ; openssl genrsa -out server.key 2048 && cp server.key server.key.org && openssl rsa -in server.key.org -out server.key
- cp server.key server.key.org
- openssl rsa -in server.key.org -out server.key
openssl req -new -key server.key -out server.csr

generate self-signed:

openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

CN must be full domain address

.well-known/pki-validation/926C419392B7B26DFCECBAEB9F163A53.txt

64.27. async/await and ASGI

Flask supports async coroutines for view functions by executing the coroutine on a separate thread instead of using an event loop on the main thread as an async-first (ASGI) framework would. This is necessary for Flask to remain backwards compatible with extensions and code built before async was introduced into Python. This compromise introduces a performance cost compared with the ASGI frameworks, due to the overhead of the threads.

you can run async code within a view, for example to make multiple concurrent database queries, HTTP requests to an external API, etc. However, the number of requests your application can handle at one time will remain the same.

64.28. use HTTPS

unstable certificate:

flask run --cert=adhoc

app.run(ssl_context='adhoc')

stable

generate: openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365

app.run(ssl_context=('cert.pem', 'key.pem'))

flask run --cert=cert.pem --key=key.pem

python micro_file_server/__main__.py --cert=.cert/cert.pem --key=.cert/key.pem

64.29. links

65. FastAPI

built-in data validation feature
error messages displayed in JSON format
anychronous task support - asyncio
documentation support - automatic
feature-rich: HTTPS requests, OAuth, XML/JSON response, TLS encryption
built-in monitoring tools
cons: expensive, difficult to scale

implement ASGI specification

66. Databases

66.1. Groonga

http://groonga.org/docs/ GNU Lesser General Public License v2.1

full text search engine based on inverted index
updates without read locks
column-oriented database management system
read lock-free
Geo-location (latitude and longitude) search

start:

apt-get install groonga
$ groonga -n grb.db - create database
$ groonga -s -p 10041 grb.db

0.0.0.0:10041

66.1.1. Basic commands:

status: shows status of a Groonga process.
table_list: shows a list of tables in a database.
column_list: shows a list of columns in a table.
table_create: adds a table to a database.
column_create: adds a column to a table.
select: searches records from a table and shows the result.
load: inserts records to a table.

table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText
select --table Site
column_create --table Site --name gender --type UInt8

select Site --filter 'fuzzy_search(_key, "two")'

https://github.com/groonga/groonga/search?l=C&q=fuzzy_search

default:

data.max_distance = 1;
data.prefix_length = 0;
data.prefix_match_size = 0;
data.max_expansion = 0;

66.1.2. python

https://github.com/hhatto/poyonga

pip install --upgrade poyonga

groonga -s --protocol http grb.db

from poyonga import Groonga
g = Groonga(port=10041, protocol="http", host='0.0.0.0')
print(g.call("status").status)
# >>> 0

load

from poyonga import Groonga

def _call(g, cmd, **kwargs):
    ret = g.call(cmd, **kwargs)
    print(ret.status)
    print(ret.body)
    if cmd == 'select':
        for item in ret.items:
            print(item)
        print("=*=" * 30)

data = """\
[
  {
    "_key": "one",
    "gender": 1,
  }
]
"""
_call(g, "load", table="Site", values="".join(data.splitlines()))

66.2. Oracle

https://www.oracle.com/database/technologies/instant-client.html

python cx_Oracle

require: Oracle Instant Client - Basic zip, SQLPlus zip (for console)

.bashrc

export LD_LIBRARY_PATH=/home/u2/.local/instantclient_19_8:$LD_LIBRARY_PATH

wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basic-linuxx64.zip
unzip instantclient-basic-linuxx64.zip
apt-get install libaio1
export LD_LIBRARY_PATH=/instantclient_19_8:$LD_LIBRARY_PATH

66.2.1. sql

SELECT *
FROM
    nls_database_parameters
WHERE
    PARAMETER = 'NLS_NCHAR_CHARACTERSET';

DELETE FROM table - remove records
drop table - remove table

SELECT * FROM ALL_OBJECTS - system
SELECT * FROM v$version - oracle version

66.3. MySQL

67. Virtualenv

enables multiple side-by-side installations of Python, one for each project.

67.1. venv - default module

Creation of virtual environments is done by executing the command venv:

python3 -m venv path
source <venv>/bin/activate

67.2. virtualenv

pip3.6 install virtualenv –user
~/.local/bin/virtualenv ENV
source ENV/bin/activate

68. ldap

apt-get install libsasl2-dev python-dev libldap2-dev libssl-dev

69. Containerized development

Docker

ENV values are available to containers

USER = os.getenv('API_USER')
PASSWORD = os.environ.get('API_PASSWORD')

os.environ['API_USER'] = 'username'
os.environ['API_PASSWORD'] = 'secret'

70. security

html.escape - <html> => <html>
from werkzeug.utils import secure_filename - request.files['the_file'].filename
32.9 - 64.17

71. serialization

pickle (unsafe alone) + hmac
json
YAML: YAML is a superset of JSON, but easier to read (read & write, comparison of JSON and YAML)
csv
MessagePack (Python package): More compact representation (read & write)
HDF5 (Python package): Nice for matrices (read & write)
XML: exists too sigh (read & write)

71.1. pickle

# -- pandas pickle and csv --
import pickle
p: str = p
if p.endswith('.csv'):
    df = pd.read_csv(p, index_col=0, low_memory=False, nrows=nrows)
elif p.endswith('.pickle'):
    df: pd.DataFrame = pd.read_pickle(p)

# -- pickle
import pickle
with open('filename.pickle', 'wb') as fh:
    pickle.dump(a, fh, protocol=pickle.HIGHEST_PROTOCOL)
with open('filename.pickle', 'rb') as fh:
    b = pickle.load(fh)

72. cython

cython -3 –embed a.py
gcc `python3-config –cflags –ldflags` -lpython3.10 -fPIC -shared a.c

from doc:

gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing \
      -I/usr/include/python3.5 -o yourmod.so yourmod.c

73. headles browsers

https://github.com/dhamaniasad/HeadlessBrowsers

74. selenium

Selenium WebDriver - interface to write instructions that work interchangeably across browsers, like a headless browser.
- 1) Protocol specification
- 2) Ruby official implementation for Protocol specification
- 3) ChromeDriver, GeckoDriver - implementations of specification by Google and Mozilla. Most drivers are created by the browser vendors themselves
Selenium Remote Control (RC) (pip install selenium) simple? interface to browsers and to webdirever
Selenium IDE - browser plug-in, records your actions in the browser and repeats them.
Selenium Grid - allows you to run parallel tests on multiple machines and browsers at the same time
bindings for languages.

pros:

easily integrates with various development platforms such as Jenkins, Maven, TestNG, QMetry, SauceLabs, etc.

cons:

No built-in image comparison ( Sikuli is a common choice)
No tech support
No reporting capabilities
- TestNG creates two types of reports upon test execution: detailed and summary. The summary provides simple passed/failed data; while detailed reports have logs, errors, test groups, etc.
- JUnit uses HTML to generate simple reports in Selenium with indicators “failed” and “succeeded.”
- Extent Library is the most complex option: It creates test summaries, includes screenshots, generates pie charts, and so on.
- Allure creates beautiful reports with graphs, a timeline, and categorized test results — all on a handy dashboard.
well-coded Selenium test typically verifies less than 10% of the user interface

web mobile apps. based on Selenium.

Selendroid focused exclusively on Android
Appium - iOS, Android, and Windows devices
Robotium — a black-box testing framework for Android
ios-driver—a Selenium WebDriver API for iOS testing integrated with Selenium Grid

74.1. drivers

Chrome: https://chromedriver.chromium.org/downloads

source C++ https://chromium.googlesource.com/chromium/src/+/main/chrome/test/chromedriver/

Edge: https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/

Firefox: https://github.com/mozilla/geckodriver/releases

gentoo: USE="geckodriver" emerge www-client/firefox
https://firefox-source-docs.mozilla.org/testing/geckodriver/
source Rust https://hg.mozilla.org/mozilla-central/file/tip/testing/geckodriver

Safari: https://webkit.org/blog/6900/webdriver-support-in-safari-10/

74.2. install

https://packages.gentoo.org/packages/dev-ruby/selenium-webdriver binding for Selenium Remote Control https://packages.gentoo.org/packages/dev-python/selenium

74.3. python installantion

www-client/firefox with geckodriver - it is WebDriver implementation for Firefox https://github.com/mozilla/geckodriver
dev-python/selenium -

74.4. python usage

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("https://google.com")
for i in range(1):
    matched_elements = driver.get("https://www.google.com/search?q=" +
                                     search_string + "&start=" + str(i))

# driver.find_element_by_id("nav-search").send_keys("Selenium")

74.5. links

75. plot in terminal

75.1. plotext

https://github.com/piccolomo/plotext Нагрузка на воркерах 0 и 1 - 400 и 500:

pip install plotext
python3 -c "import plotext as plt; plt.bar([0,1],[400,500]) ; plt.show() ;"

76. xml parsing

import xml.etree.ElementTree as ET
xmlfile = "a.xml"
tree = ET.parse(xmlfile)
root = tree.getroot()
for child in root:
    print(child.tag, [x.tag for x in child], child.attrib)

77. pytest

77.1. features

Detailed info on failing assert statements
Auto-discovery of test modules and functions: https://docs.pytest.org/en/stable/explanation/goodpractices.html#conventions-for-python-test-discovery
- if no "testpaths" and not directories as arguments defined - Recurse into directories
- test_*.py or *_test.py
- "test" prefixed functions.
- test prefixed test functions or methods inside Test prefixed test classes
Modular fixtures for managing small or parametrized long-lived test resources https://docs.pytest.org/en/stable/explanation/fixtures.html
Can run "unittest" (or trial), "nose" test suites out of the box
Rich plugin architecture, with over 850+ external plugins and thriving community https://docs.pytest.org/en/latest/reference/plugin_list.html

[pytest] # pytest.ini (or .pytest.ini), pyproject.toml, tox.ini, or setup.cfg
testpaths = testing doc # as if $pytest testing doc

pytest -x # stop after first failure pytest –maxfail=2 # stop after two failures

77.2. layout

pyproject.toml
src/
    mypkg/
        __init__.py
        app.py
        view.py
tests/
    test_app.py
    test_view.py
    ...

77.3. usage

cd project (with pyproject.toml and test folder)
pytest [ foders … ] - packages should be added to PYTHONPATH manually
or python -m pytest (this one add the current directory to sys.path) - current directory must be src or package(for flat)

77.4. dependencies

dev-python/pytest-7.3.2:
 [  0]  dev-python/pytest-7.3.2
 [  1]  dev-python/iniconfig-2.0.0
 [  1]  dev-python/more-itertools-9.1.0
 [  1]  dev-python/packaging-23.1
 [  1]  dev-python/pluggy-1.0.0-r2
 [  1]  dev-python/exceptiongroup-1.1.1
 [  1]  dev-python/tomli-2.0.1-r1
 [  1]  dev-python/pypy3-7.3.11_p1
 [  1]  dev-lang/python-3.10.11
 [  1]  dev-lang/python-3.11.3
 [  1]  dev-lang/python-3.12.0_beta2
 [  1]  dev-python/setuptools-scm-7.1.0
 [  1]  dev-python/argcomplete-3.0.8
 [  1]  dev-python/attrs-23.1.0
 [  1]  dev-python/hypothesis-6.76.0
 [  1]  dev-python/mock-5.0.2
 [  1]  dev-python/pygments-2.15.1
 [  1]  dev-python/pytest-xdist-3.3.1
 [  1]  dev-python/requests-2.31.0
 [  1]  dev-python/xmlschema-2.3.0
 [  1]  dev-python/gpep517-13
 [  1]  dev-python/setuptools-67.7.2
 [  1]  dev-python/wheel-0.40.0

77.5. fixtures - context for the test

fixtures can use other fixtures

import pytest

class Fruit:
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        return self.name == other.name


@pytest.fixture
def my_fruit():
    return Fruit("apple")


@pytest.fixture
def fruit_basket(my_fruit):
    return [Fruit("banana"), my_fruit]


def test_my_fruit_in_basket(my_fruit, fruit_basket):
    assert my_fruit in fruit_basket

https://docs.pytest.org/en/latest/explanation/fixtures.html#what-fixtures-are

77.6. print

capture stdout and stderr to see only passed tests

pytest -s                  # disable all capturing

77.7. troubleshooting

ModuleNotFoundError: No module named 'micro_file_server'

solution 1: pyproject.toml:

[tool.pytest.ini_options]
pythonpath = [ "." ]

77.8. links

78. static analysis tools:

Pylint - coding standards compliance and various error checkers, similar/duplicate code, https://pylint.readthedocs.io/en/latest/user_guide/checkers/features.html
Pyflakes - only errors checks, tries very hard not to produce false positives
flake8 - Pyflakes with style checks against PEP 8.
pycodestyle - Simple Python style checker in one Python file to check the python code against the style conventions of PEP8.
https://github.com/astral-sh/ruff
- https://github.com/erickgnavar/flymake-ruff
Bandit - common security treats. https://github.com/PyCQA/bandit
Dodgy - secrets leak detection. https://github.com/landscapeio/dodgy
Pyright (Microsoft extension for Visual Studio Code)

statis type checkers - mypy, Pyre

https://github.com/analysis-tools-dev/static-analysis#python

78.1. security

Common Vulnerabilities and Exposures (CVE)

CVEs - We can count them and fix them
SCA - composition analysis tools.
- Mostly signature based
- 3rd party and our own
vulnerabilities

Things that probably won’t hurt us

Good habits/code hygiene
Active development
Developers we trust
CVE and SCA clear

78.2. mypy

reveal_type() - To find out what type mypy infers for an expression anywhere in your program.

78.2.1. emacs fix

mypy /dev/stdin

78.2.2. ex

import random
from typing import Sequence, TypeVar

Choosable = TypeVar("Choosable", str, float)

def choose(items: Sequence[Choosable]) -> Choosable:
    return random.choice(items)

reveal_type(choose(["Guido", "Jukka", "Ivan"]))
reveal_type(choose([1, 2, 3]))
reveal_type(choose([True, 42, 3.14]))
reveal_type(choose(["Python", 3, 7]))

/dev/stdin:14: note: Revealed type is "builtins.str"
/dev/stdin:16: note: Revealed type is "builtins.float"
/dev/stdin:18: note: Revealed type is "builtins.float"
/dev/stdin:20: error: Value of type variable "Choosable" of "choose" cannot be "object"  [type-var]
/dev/stdin:20: note: Revealed type is "builtins.object"
Found 1 error in 1 file (checked 1 source file)

78.2.3. links

79. release as execuable - Pyinstaller

Pyinstller: https://pyinstaller.org/en/stable/usage.html

https://anshumanfauzdar.medium.com/using-github-actions-to-bundle-python-application-into-a-single-package-and-automatic-release-834bd42e0670

Actions:

80. troubleshooting

def a(l:dir = []):

If the user provides an empty list your version will not use that list but instead create a new one, because an empty list is "falsy"
empty list is created just once when the function is defined, not every time the function is called.

python tests/test_main.py - ModuleNotFoundError: No module named

solution: PYTHONPATH=. python tests/test_main.py

Table of Contents

1. most common structures

1.1. sliced windows

1.2. compare row to itself

2. tools 2022 pypi

2.1. web frameworks

2.2. additional libraries

2.3. machine learning frameworks

2.4. cloud platforms do you use? *This question is required.

2.5. ORM(s) do you use together with Python, if any? *This question is required.

2.6. Big Data tool(s) do you use, if any? *This question is required.

2.7. Continuous Integration (CI) system(s) do you regularly use? *This question is required.

2.8. configuration management tools do you use, if any? *This question is required.

2.9. documentation tool do you use? *This question is required.

2.10. IDE features

2.11. isolate Python environments between projects? *This question is required.

2.12. tools related to Python packaging do you use directly? *This question is required.

2.13. application dependency management? *This question is required.

2.14. automated services to update the versions of application dependencies? *This question is required.

2.15. installing packages? *This question is required.

2.16. tool(s) do you use to develop Python applications? *This question is required.

2.17. job role(s)? *This question is required.

3. install

3.1. change Python version Ubuntu & Debian

4. Python theory

4.1. Python [ˈpʌɪθ(ə)n] паисэн

4.2. philosophy

4.3. History

4.3.1. 3.0

4.4. Implementations

4.5. Bytecode:

4.6. terms

4.7. Indentation - Отступ слева and blank lines

4.8. mathematic

4.9. WSGI (Web Server Gateway Interface)(whiskey)

5. scripting

5.1. top-level script enironment

5.2. command line arguments parsing

5.3. python executable

5.4. current dir

5.5. unix logger

5.6. How does python find packages?

5.7. dist-packages and site-packages?

5.8. file size and modification date

5.9. environment

5.10. -m mod - run library module as a script

5.10.1. e.g. mymodule/__main__.py:

6. Data model

6.1. special types

6.2. theory

6.3. Types build-in

6.4. Truth Value Testing

6.5. Shallow and deep copy operations

6.6. links

7. typed varibles or type hints

7.1. typing.Annotated and PEP-593

7.1.1. from typing import get_type_hints

7.1.2. Use case: A calendar Event model, using pydantic https://github.com/pydantic/pydantic

7.2. function annotation

8. Strings

8.1. основы

8.1.1. multiline

8.2. A formatted string literal or f-string

8.3. String Formatting Operator

8.4. string literal prefixes

8.5. raw strings, Unicode, formatted

8.6. Efficient String Concatenation

8.7. byte string

9. Classes

9.1. basic

9.2. Special Attributes

9.3. inheritance

9.3.1. Constructor

9.3.2. Subclassing:

9.3.3. built-in functions that work with inheritance:

9.3.4. example

9.3.5. Multiple inheritance - left-to-right

9.3.6. Abstract class (ABC - abstract base class)

9.3.7. Virtual sublasses

9.3.8. calling parent class constructor

5.10.1. e.g. mymodule/main.py: